Introduction
Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. By using Azure AI Search as a knowledge base and Azure OpenAI for generation, you can build AI systems that provide accurate, contextual responses.
Architecture Overview
The RAG pattern consists of three main components:
- Data Ingestion: Process and index your documents
- Retrieval: Search for relevant context
- Generation: Use LLM to generate responses with context
Setting Up Azure AI Search
Create Search Index
var index = new SearchIndex("documents-index")
{
Fields =
{
new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
new SearchableField("content") { AnalyzerName = LexicalAnalyzerName.EnMicrosoft },
new SearchableField("title"),
new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
{
IsSearchable = true,
VectorSearchDimensions = 1536,
VectorSearchProfileName = "vector-profile"
}
},
VectorSearch = new()
{
Profiles = { new VectorSearchProfile("vector-profile", "hnsw-config") },
Algorithms = { new HnswAlgorithmConfiguration("hnsw-config") }
}
};
Generate Embeddings
var embeddingClient = new OpenAIClient(
new Uri(endpoint),
new AzureKeyCredential(apiKey)
);
var embeddingOptions = new EmbeddingsOptions("text-embedding-ada-002", new[] { text });
var embedding = await embeddingClient.GetEmbeddingsAsync(embeddingOptions);
Implementing RAG
Search for Context
public async Task<List<string>> SearchAsync(string query, float[] queryVector)
{
var searchOptions = new SearchOptions
{
VectorSearch = new()
{
Queries = { new VectorizedQuery(queryVector) { KNearestNeighborsCount = 5, Fields = { "contentVector" } } }
},
Size = 5,
Select = { "content", "title" }
};
var results = await _searchClient.SearchAsync<SearchDocument>(query, searchOptions);
return results.Value.GetResults().Select(r => r.Document["content"].ToString()).ToList();
}
Generate Response
public async Task<string> GenerateResponseAsync(string question, List<string> context)
{
var systemPrompt = $"""
You are a helpful assistant. Answer questions based on the provided context.
If you don't know the answer, say so.
Context:
{string.Join("\n\n", context)}
""";
var chatOptions = new ChatCompletionsOptions("gpt-4", new[]
{
new ChatRequestSystemMessage(systemPrompt),
new ChatRequestUserMessage(question)
});
var response = await _openAIClient.GetChatCompletionsAsync(chatOptions);
return response.Value.Choices[0].Message.Content;
}
Best Practices
- Chunk Documents Properly: Use semantic chunking with overlap
- Hybrid Search: Combine vector and keyword search
- Reranking: Use semantic reranking for better results
- Prompt Engineering: Craft effective system prompts
- Monitor Quality: Track answer quality and relevance
Conclusion
RAG enables you to build AI applications that leverage your organization's knowledge while maintaining accuracy and reducing hallucinations.


