Building a RAG Application with Azure AI Search
Retrieval-Augmented Generation (RAG) is the most popular pattern for grounding LLM responses with your own data. This tutorial shows how to build a production-ready RAG application using Azure AI Search and Azure OpenAI.
What is RAG?
User Question
↓
Retrieve relevant documents from Azure AI Search
↓
Augment the prompt with retrieved context
↓
Generate a grounded response with Azure OpenAI
Without RAG: "What is Contoso's refund policy?" → AI hallucinates an answer With RAG: "What is Contoso's refund policy?" → Searches your docs → Returns the actual policy
Prerequisites
- Azure AI Search service (Basic tier or higher for vector search)
- Azure OpenAI with GPT-4o and text-embedding-ada-002 deployed
- Azure Blob Storage (for document source)
Step 1: Set Up Azure AI Search
az search service create \
--name "my-search-service" \
--resource-group "my-rg" \
--sku "basic" \
--location "eastus2"
Step 2: Prepare and Index Documents
Option A: Integrated Vectorization (Recommended)
Azure AI Search can automatically chunk documents, generate embeddings, and index them.
- Go to Azure AI Search in Azure Portal
- Click Import and vectorize data
- Connect to your data source (Blob Storage, SQL, etc.)
- Configure chunking (default: 2000 tokens with 500 overlap)
- Select embedding model (Azure OpenAI text-embedding-ada-002)
- Create the index and indexer
Option B: Custom Indexing with Python
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, VectorSearch,
HnswAlgorithmConfiguration, VectorSearchProfile,
SearchFieldDataType
)
from azure.identity import DefaultAzureCredential
# Create index with vector field
index = SearchIndex(
name="documents",
fields=[
SearchField(name="id", type="Edm.String", key=True),
SearchField(name="content", type="Edm.String", searchable=True),
SearchField(name="title", type="Edm.String", searchable=True),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
vector_search_dimensions=1536,
vector_search_profile_name="my-profile"
),
],
vector_search=VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="my-hnsw")],
profiles=[VectorSearchProfile(
name="my-profile",
algorithm_configuration_name="my-hnsw"
)]
)
)
index_client = SearchIndexClient(
endpoint="https://my-search.search.windows.net",
credential=DefaultAzureCredential()
)
index_client.create_or_update_index(index)
Document Chunking
def chunk_document(text: str, chunk_size: int = 2000, overlap: int = 500):
"""Split document into overlapping chunks."""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start = end - overlap
return chunks
Step 3: Generate Embeddings
from openai import AzureOpenAI
openai_client = AzureOpenAI(
api_key="your-key",
api_version="2024-08-01-preview",
azure_endpoint="https://my-openai.openai.azure.com/"
)
def get_embedding(text: str) -> list[float]:
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
Step 4: Search — Vector + Hybrid
Hybrid Search (Best Results)
Combines traditional keyword search with vector similarity:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery
search_client = SearchClient(
endpoint="https://my-search.search.windows.net",
index_name="documents",
credential=DefaultAzureCredential()
)
results = search_client.search(
search_text="refund policy", # keyword search
vector_queries=[
VectorizableTextQuery(
text="What is the refund policy?", # vector search
k_nearest_neighbors=5,
fields="content_vector"
)
],
select=["title", "content"],
top=5
)
context = "\n\n".join([r["content"] for r in results])
Step 5: Generate Grounded Response
def ask_with_rag(question: str) -> str:
# 1. Search for relevant documents
results = search_client.search(
search_text=question,
vector_queries=[VectorizableTextQuery(
text=question, k_nearest_neighbors=5, fields="content_vector"
)],
top=5
)
context = "\n\n---\n\n".join([r["content"] for r in results])
# 2. Generate grounded response
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"""Answer the user's question
based ONLY on the following context. If the context doesn't contain
the answer, say "I don't have information about that."
Context:
{context}"""},
{"role": "user", "content": question}
],
temperature=0
)
return response.choices[0].message.content
# Usage
answer = ask_with_rag("What is the return policy for electronics?")
print(answer)
Step 6: Build the Chat UI
Use Azure's chat-with-your-data template to quickly build a frontend:
# Clone the official sample
git clone https://github.com/Azure-Samples/azure-search-openai-demo
cd azure-search-openai-demo
# Deploy with azd
azd up
This gives you a production-ready chat application with citation support, follow-up questions, and conversation history.
Step 7: Evaluate and Optimize
- Relevance: Are the right documents being retrieved?
- Groundedness: Is the LLM sticking to the retrieved context?
- Chunk size: Experiment with 512, 1024, 2000 token chunks
- Hybrid vs Vector: Test both approaches with your data
Resources
- Azure AI Search Documentation
- RAG with Azure AI Search Tutorial
- Azure Search OpenAI Demo (GitHub)
- Microsoft Learn: RAG Training
Video: Watch the RAG with Azure AI Search deep dive on the Microsoft Azure YouTube channel.


