Azure AI Document Intelligence: Extract Data from Documents
Back to Tutorials
Azure AIBeginner5 Steps

Azure AI Document Intelligence: Extract Data from Documents

Microsoft LearnOctober 5, 202530 min watch30 min video

Use Azure AI Document Intelligence to extract structured data from forms, invoices, receipts, and custom documents. Covers prebuilt models, custom models, and integration patterns.

Azure AI Document Intelligence

Azure AI Document Intelligence (formerly Form Recognizer) uses AI to extract text, key-value pairs, tables, and structures from documents. It handles invoices, receipts, IDs, tax forms, contracts, and custom document types.

Key Capabilities

  • Prebuilt Models — Ready-to-use models for common document types
  • Custom Models — Train on your own document formats
  • Document Classification — Automatically sort documents by type
  • Add-on Features — Handwriting, barcodes, formulas, font styling

Prerequisites

  • Azure subscription
  • Create a Document Intelligence resource in the Azure Portal

Step 1: Try the Studio

Document Intelligence Studio lets you test models visually.

  1. Go to documentintelligence.ai.azure.com
  2. Choose a prebuilt model (e.g., Invoice)
  3. Upload a sample document or use the provided samples
  4. Click Analyze and review extracted fields

Step 2: Use Prebuilt Models

Invoice Model

from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential

client = DocumentIntelligenceClient(
    endpoint="https://my-doc-intel.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("your-key")
)

with open("invoice.pdf", "rb") as f:
    poller = client.begin_analyze_document(
        model_id="prebuilt-invoice",
        body=f
    )
result = poller.result()

for invoice in result.documents:
    print(f"Vendor: {invoice.fields['VendorName'].content}")
    print(f"Invoice #: {invoice.fields['InvoiceId'].content}")
    print(f"Total: {invoice.fields['InvoiceTotal'].content}")
    print(f"Due Date: {invoice.fields['DueDate'].content}")

    # Line items
    for item in invoice.fields.get("Items", {}).get("valueArray", []):
        desc = item["valueObject"]["Description"]["content"]
        amount = item["valueObject"]["Amount"]["content"]
        print(f"  - {desc}: {amount}")

Available Prebuilt Models

Model Use Case
prebuilt-invoice Invoices and bills
prebuilt-receipt Receipts from retail/restaurants
prebuilt-idDocument Passports, driver's licenses
prebuilt-tax.us.w2 US W-2 tax forms
prebuilt-healthInsuranceCard.us Insurance cards
prebuilt-contract Contracts and agreements
prebuilt-layout General document structure
prebuilt-read OCR text extraction

Step 3: Build Custom Models

For documents unique to your business:

1. Gather Training Data

  • Minimum 5 sample documents (recommend 15+)
  • Ensure variety in layouts and content

2. Label Your Documents

  1. Go to Document Intelligence Studio
  2. Create a new Custom extraction model project
  3. Upload your training documents
  4. Label the fields you want to extract (drag to select text regions)

3. Train the Model

poller = client.begin_build_document_model(
    build_mode="template",  # or "neural" for varied layouts
    model_id="my-purchase-order",
    blob_container_url="https://mystorage.blob.core.windows.net/training-data"
)
model = poller.result()
print(f"Model ID: {model.model_id}")
print(f"Fields: {[f for f in model.doc_types['my-purchase-order'].field_schema]}")

Template vs Neural

Feature Template Neural
Layout Fixed/similar Varied
Training data 5+ docs 15+ docs
Speed Fast Slower
Best for Structured forms Semi-structured docs

Step 4: Automate with Logic Apps / Power Automate

Power Automate Example

Trigger: When a file is added to SharePoint "Invoices" folder
→ Extract invoice fields using Document Intelligence
→ If amount > $10,000: Send approval to manager
→ Create row in Excel with extracted data
→ Archive the processed document

.NET Integration Example

using Azure.AI.DocumentIntelligence;
using Azure;

var client = new DocumentIntelligenceClient(
    new Uri("https://my-doc-intel.cognitiveservices.azure.com/"),
    new AzureKeyCredential("your-key"));

var content = new AnalyzeDocumentContent
{
    UrlSource = new Uri("https://example.com/invoice.pdf")
};

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Completed, "prebuilt-invoice", content);

var result = operation.Value;
foreach (var doc in result.Documents)
{
    Console.WriteLine($"Vendor: {doc.Fields["VendorName"].Content}");
    Console.WriteLine($"Total: {doc.Fields["InvoiceTotal"].Content}");
}

Resources

Video: Search "Azure AI Document Intelligence" on Microsoft Azure YouTube for the latest walkthroughs.

Document IntelligenceOCRForm RecognizerAzure AIAutomation

Share this tutorial

Chapters (5)

  1. 1

    Document Intelligence Overview

    Capabilities and supported document types

  2. 2

    Prebuilt Models

    Use invoice, receipt, and ID document models

  3. 3

    Custom Models

    Train custom extraction models with your documents

  4. 4

    Studio & API

    Test in Document Intelligence Studio and integrate via API

  5. 5

    Real-World Scenarios

    Automation patterns with Logic Apps and Power Automate

About the Author

KH

Microsoft Learn

Microsoft MVP | AI Engineer

Software & AI Engineer specializing in Microsoft Azure, .NET, and cutting-edge AI technologies.

Need help with your project?

Let's discuss how I can help bring your ideas to life.

Get In Touch