Build a RAG Application with Azure AI Search
Back to Tutorials
Azure AIIntermediate7 Steps

Build a RAG Application with Azure AI Search

Microsoft LearnNovember 20, 202555 min watch55 min video

Implement Retrieval-Augmented Generation (RAG) using Azure AI Search and Azure OpenAI. Index your documents, create vector embeddings, and build a grounded chat experience.

Building a RAG Application with Azure AI Search

Retrieval-Augmented Generation (RAG) is the most popular pattern for grounding LLM responses with your own data. This tutorial shows how to build a production-ready RAG application using Azure AI Search and Azure OpenAI.

What is RAG?

User Question
    ↓
Retrieve relevant documents from Azure AI Search
    ↓
Augment the prompt with retrieved context
    ↓
Generate a grounded response with Azure OpenAI

Without RAG: "What is Contoso's refund policy?" → AI hallucinates an answer With RAG: "What is Contoso's refund policy?" → Searches your docs → Returns the actual policy

Prerequisites

  • Azure AI Search service (Basic tier or higher for vector search)
  • Azure OpenAI with GPT-4o and text-embedding-ada-002 deployed
  • Azure Blob Storage (for document source)

Step 1: Set Up Azure AI Search

az search service create \
  --name "my-search-service" \
  --resource-group "my-rg" \
  --sku "basic" \
  --location "eastus2"

Step 2: Prepare and Index Documents

Option A: Integrated Vectorization (Recommended)

Azure AI Search can automatically chunk documents, generate embeddings, and index them.

  1. Go to Azure AI Search in Azure Portal
  2. Click Import and vectorize data
  3. Connect to your data source (Blob Storage, SQL, etc.)
  4. Configure chunking (default: 2000 tokens with 500 overlap)
  5. Select embedding model (Azure OpenAI text-embedding-ada-002)
  6. Create the index and indexer

Option B: Custom Indexing with Python

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SearchField, VectorSearch,
    HnswAlgorithmConfiguration, VectorSearchProfile,
    SearchFieldDataType
)
from azure.identity import DefaultAzureCredential

# Create index with vector field
index = SearchIndex(
    name="documents",
    fields=[
        SearchField(name="id", type="Edm.String", key=True),
        SearchField(name="content", type="Edm.String", searchable=True),
        SearchField(name="title", type="Edm.String", searchable=True),
        SearchField(
            name="content_vector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            vector_search_dimensions=1536,
            vector_search_profile_name="my-profile"
        ),
    ],
    vector_search=VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="my-hnsw")],
        profiles=[VectorSearchProfile(
            name="my-profile",
            algorithm_configuration_name="my-hnsw"
        )]
    )
)

index_client = SearchIndexClient(
    endpoint="https://my-search.search.windows.net",
    credential=DefaultAzureCredential()
)
index_client.create_or_update_index(index)

Document Chunking

def chunk_document(text: str, chunk_size: int = 2000, overlap: int = 500):
    """Split document into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

Step 3: Generate Embeddings

from openai import AzureOpenAI

openai_client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-08-01-preview",
    azure_endpoint="https://my-openai.openai.azure.com/"
)

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

Step 4: Search — Vector + Hybrid

Hybrid Search (Best Results)

Combines traditional keyword search with vector similarity:

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

search_client = SearchClient(
    endpoint="https://my-search.search.windows.net",
    index_name="documents",
    credential=DefaultAzureCredential()
)

results = search_client.search(
    search_text="refund policy",  # keyword search
    vector_queries=[
        VectorizableTextQuery(
            text="What is the refund policy?",  # vector search
            k_nearest_neighbors=5,
            fields="content_vector"
        )
    ],
    select=["title", "content"],
    top=5
)

context = "\n\n".join([r["content"] for r in results])

Step 5: Generate Grounded Response

def ask_with_rag(question: str) -> str:
    # 1. Search for relevant documents
    results = search_client.search(
        search_text=question,
        vector_queries=[VectorizableTextQuery(
            text=question, k_nearest_neighbors=5, fields="content_vector"
        )],
        top=5
    )

    context = "\n\n---\n\n".join([r["content"] for r in results])

    # 2. Generate grounded response
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"""Answer the user's question
based ONLY on the following context. If the context doesn't contain
the answer, say "I don't have information about that."

Context:
{context}"""},
            {"role": "user", "content": question}
        ],
        temperature=0
    )

    return response.choices[0].message.content

# Usage
answer = ask_with_rag("What is the return policy for electronics?")
print(answer)

Step 6: Build the Chat UI

Use Azure's chat-with-your-data template to quickly build a frontend:

# Clone the official sample
git clone https://github.com/Azure-Samples/azure-search-openai-demo
cd azure-search-openai-demo

# Deploy with azd
azd up

This gives you a production-ready chat application with citation support, follow-up questions, and conversation history.

Step 7: Evaluate and Optimize

  • Relevance: Are the right documents being retrieved?
  • Groundedness: Is the LLM sticking to the retrieved context?
  • Chunk size: Experiment with 512, 1024, 2000 token chunks
  • Hybrid vs Vector: Test both approaches with your data

Resources

Video: Watch the RAG with Azure AI Search deep dive on the Microsoft Azure YouTube channel.

RAGAzure AI SearchAzure OpenAIVector SearchEmbeddings

Share this tutorial

Chapters (7)

  1. 1

    RAG Architecture Overview

    How RAG works and why it matters

    00:00
  2. 2

    Setting Up Azure AI Search

    Create a search service and configure indexes

    07:00
  3. 3

    Document Indexing & Chunking

    Ingest documents with integrated vectorization

    16:00
  4. 4

    Vector Search & Hybrid Search

    Configure vector and hybrid search strategies

    25:00
  5. 5

    Integrating with Azure OpenAI

    Build the orchestration layer

    34:00
  6. 6

    Building the Chat UI

    Create a conversational front end

    43:00
  7. 7

    Evaluation & Optimization

    Measure quality and improve results

    50:00

About the Author

KH

Microsoft Learn

Microsoft MVP | AI Engineer

Software & AI Engineer specializing in Microsoft Azure, .NET, and cutting-edge AI technologies.

Need help with your project?

Let's discuss how I can help bring your ideas to life.

Get In Touch