The 2026 Guide to the Microsoft Foundry Tool Catalog
Back to BlogsFoundry

The 2026 Guide to the Microsoft Foundry Tool Catalog

Khawar HabibFebruary 23, 20266 min read502 views

Microsoft’s latest rebrand to "Foundry Tools" might be confusing, but it consolidates heavy hitters like Speech, Document Intelligence, and the new Content Understanding into a single, modular ecosystem. The real value lies in features like Voice Live for low-latency agents and automated PII redaction, though the overlapping capabilities between services can make choosing the right tool a bit of a headache. Just watch your budget closely, those "cheap" individual API calls for translation or data extraction can snowball into a massive monthly bill once you hit production scale.

Microsoft rebranded everything again. If you were comfortable calling them "Azure Cognitive Services" or even "Azure AI Services", well, now it is all under "Microsoft Foundry Tools." Same services, mostly same APIs, new name. I am honestly tired of keeping track but here we are, so let me walk you through what is actually in the catalog and what matters.

The Foundry Tool Catalog is basically Microsoft's collection of prebuilt AI services that you can plug into your apps without training anything yourself. Speech, vision, language, document processing, translation — all of it sitting under one umbrella now. The idea is you use these alongside Azure AI Foundry (their model deployment platform) and build agents, pipelines, whatever you need. On paper it makes sense. In practice, the naming is confusing and half my team still calls them Cognitive Services in pull requests.

What is actually in the catalog

Six main services, and they overlap more than Microsoft will admit.

Azure Speech is probably the most mature one. Speech-to-text, text-to-speech, real-time translation, batch transcription. The new thing in 2026 is Voice Live, basically low-latency speech-to-speech for building voice agents. They have a dedicated SDK for it now in C#, Python, Java, JavaScript. We tested the real-time transcription on a client's call center project last year and honestly the accuracy was impressive for Urdu-English code-switching. Not perfect, but impressive. The fast transcription API is worth looking at if you are processing recordings in bulk, it is significantly quicker than the old batch endpoint. Custom voice is still there too if you want branded TTS, though the approval process for that is slow.

Azure Face does detection, recognition, and liveness checks. The liveness detection is what most people actually need, verifying that a real human is in front of the camera, not a photo of someone's face. If you are building KYC flows or identity verification, this is the tool. But be careful with the access, Microsoft gates face recognition features behind an approval form because of responsible AI concerns. I applied for a project and it took almost two weeks to get approved. Plan accordingly.

Content Understanding is the newer one and honestly the most interesting. It takes documents, images, audio, video, any content type, and turns it into structured data. Think of it as the general-purpose extraction tool. You can build custom analyzers or use prebuilt ones. The pro mode uses more compute and gives better results on complex documents. I see this replacing a lot of custom pipelines people have built. We used it for a RAG project where we needed to process mixed-format data — PDFs, scanned images, some audio recordings, and having one API handle all of it saved us from stitching together three different services. The studio interface is decent for prototyping.

Document Intelligence (used to be Form Recognizer, then it was Document Intelligence, now still Document Intelligence but under Foundry Tools) is version 4.0. It extracts key-value pairs, tables, text from documents. Has prebuilt models for invoices, receipts, tax forms, IDs. The custom model training is where it gets powerful, you label 5-10 samples and it learns your document structure. The overlap with Content Understanding is real and confusing. My rough rule: if it is a structured form or a known document type, use Document Intelligence. If it is mixed media or you need more flexibility, use Content Understanding.

Azure Language covers NLP tasks, NER, PII detection, sentiment analysis, text summarization, question answering, conversational language understanding. The PII detection is solid and we use it heavily for redacting personal data before sending text to LLMs. One thing, LUIS and QnA Maker are officially deprecated now, so if you are still on those, migrate to the new CLU and question answering services. Don't wait on this.

Azure Translator supports text and full document translation across a massive number of languages. The custom translator feature lets you train on your own terminology, useful for legal or medical content where generic translation is not good enough. They also have containers now if you need to run translation on-premises.

The cost nobody warns you about

Pricing for individual calls looks cheap. The Speech service free tier gives you 5 hours of speech-to-text per month. Document Intelligence gives you 500 pages free. But here is what happens in production, you start hitting the paid tiers fast, and the costs compound. Speech real-time transcription is charged per audio second. Document Intelligence charges per page. Face API charges per transaction. And when you are running these as part of an automated pipeline processing thousands of documents a day, the bill adds up in ways your initial estimate will not capture.

At OZ we ran a proof of concept with Document Intelligence processing about 2,000 invoices daily. The API cost alone was around $400/month, which was fine. But then we added PII redaction through Language service, and translation for the multilingual ones, and suddenly we were looking at $1,200/month for what started as a "cheap API call" project. Always budget for the full pipeline, not just the primary service.

The Foundry Tool Catalog is a solid collection if you know what each piece does and where the boundaries are. The biggest problem is not the tools themselves, it is figuring out which one to use when three of them seem to do the same thing.

Microsoft Foundry ToolsAzure AI ServicesDocument IntelligenceSpeech AIApp DevelopmentCloud CostsTech Rebranding

Share this article

About the Author

KH

Khawar Habib

Microsoft MVP | AI Engineer

Software & AI Engineer specializing in Microsoft Azure, .NET, and cutting-edge AI technologies.

Need help with your project?

Let's discuss how I can help bring your ideas to life.

Get In Touch