Build a RAG Application with Azure AI Search

Building a RAG Application with Azure AI Search

Retrieval-Augmented Generation (RAG) is the most popular pattern for grounding LLM responses with your own data. This tutorial shows how to build a production-ready RAG application using Azure AI Search and Azure OpenAI.

What is RAG?

User Question
    ↓
Retrieve relevant documents from Azure AI Search
    ↓
Augment the prompt with retrieved context
    ↓
Generate a grounded response with Azure OpenAI

Without RAG: "What is Contoso's refund policy?" → AI hallucinates an answer With RAG: "What is Contoso's refund policy?" → Searches your docs → Returns the actual policy

Prerequisites

Azure AI Search service (Basic tier or higher for vector search)
Azure OpenAI with GPT-4o and text-embedding-ada-002 deployed
Azure Blob Storage (for document source)

Step 1: Set Up Azure AI Search

az search service create \
  --name "my-search-service" \
  --resource-group "my-rg" \
  --sku "basic" \
  --location "eastus2"

Step 2: Prepare and Index Documents

Option A: Integrated Vectorization (Recommended)

Azure AI Search can automatically chunk documents, generate embeddings, and index them.

Go to Azure AI Search in Azure Portal
Click Import and vectorize data
Connect to your data source (Blob Storage, SQL, etc.)
Configure chunking (default: 2000 tokens with 500 overlap)
Select embedding model (Azure OpenAI text-embedding-ada-002)
Create the index and indexer

Option B: Custom Indexing with Python

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SearchField, VectorSearch,
    HnswAlgorithmConfiguration, VectorSearchProfile,
    SearchFieldDataType
)
from azure.identity import DefaultAzureCredential

# Create index with vector field
index = SearchIndex(
    name="documents",
    fields=[
        SearchField(name="id", type="Edm.String", key=True),
        SearchField(name="content", type="Edm.String", searchable=True),
        SearchField(name="title", type="Edm.String", searchable=True),
        SearchField(
            name="content_vector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            vector_search_dimensions=1536,
            vector_search_profile_name="my-profile"
        ),
    ],
    vector_search=VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="my-hnsw")],
        profiles=[VectorSearchProfile(
            name="my-profile",
            algorithm_configuration_name="my-hnsw"
        )]
    )
)

index_client = SearchIndexClient(
    endpoint="https://my-search.search.windows.net",
    credential=DefaultAzureCredential()
)
index_client.create_or_update_index(index)

Document Chunking

def chunk_document(text: str, chunk_size: int = 2000, overlap: int = 500):
    """Split document into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

Step 3: Generate Embeddings

from openai import AzureOpenAI

openai_client = AzureOpenAI(
    api_key="your-key",
    api_version="2024-08-01-preview",
    azure_endpoint="https://my-openai.openai.azure.com/"
)

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

Step 4: Search — Vector + Hybrid

Hybrid Search (Best Results)

Combines traditional keyword search with vector similarity:

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

search_client = SearchClient(
    endpoint="https://my-search.search.windows.net",
    index_name="documents",
    credential=DefaultAzureCredential()
)

results = search_client.search(
    search_text="refund policy",  # keyword search
    vector_queries=[
        VectorizableTextQuery(
            text="What is the refund policy?",  # vector search
            k_nearest_neighbors=5,
            fields="content_vector"
        )
    ],
    select=["title", "content"],
    top=5
)

context = "\n\n".join([r["content"] for r in results])

Step 5: Generate Grounded Response

def ask_with_rag(question: str) -> str:
    # 1. Search for relevant documents
    results = search_client.search(
        search_text=question,
        vector_queries=[VectorizableTextQuery(
            text=question, k_nearest_neighbors=5, fields="content_vector"
        )],
        top=5
    )

    context = "\n\n---\n\n".join([r["content"] for r in results])

    # 2. Generate grounded response
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"""Answer the user's question
based ONLY on the following context. If the context doesn't contain
the answer, say "I don't have information about that."

Context:
{context}"""},
            {"role": "user", "content": question}
        ],
        temperature=0
    )

    return response.choices[0].message.content

# Usage
answer = ask_with_rag("What is the return policy for electronics?")
print(answer)

Step 6: Build the Chat UI

Use Azure's chat-with-your-data template to quickly build a frontend:

# Clone the official sample
git clone https://github.com/Azure-Samples/azure-search-openai-demo
cd azure-search-openai-demo

# Deploy with azd
azd up

This gives you a production-ready chat application with citation support, follow-up questions, and conversation history.

Step 7: Evaluate and Optimize

Relevance: Are the right documents being retrieved?
Groundedness: Is the LLM sticking to the retrieved context?
Chunk size: Experiment with 512, 1024, 2000 token chunks
Hybrid vs Vector: Test both approaches with your data

Resources

Video: Watch the RAG with Azure AI Search deep dive on the Microsoft Azure YouTube channel.

Build a RAG Application with Azure AI Search

Building a RAG Application with Azure AI Search

What is RAG?

Prerequisites

Step 1: Set Up Azure AI Search

Step 2: Prepare and Index Documents

Option A: Integrated Vectorization (Recommended)

Option B: Custom Indexing with Python

Document Chunking

Step 3: Generate Embeddings

Step 4: Search — Vector + Hybrid

Hybrid Search (Best Results)

Step 5: Generate Grounded Response

Step 6: Build the Chat UI

Step 7: Evaluate and Optimize

Resources

Share this tutorial

Chapters (7)

About the Author

Related Tutorials

Advanced Copilot Studio: Build Autonomous Agents

Build Custom AI Agents with Azure AI Foundry

Need help with your project?