Implementing RAG Pattern with Azure AI Search and OpenAI

Introduction

Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. By using Azure AI Search as a knowledge base and Azure OpenAI for generation, you can build AI systems that provide accurate, contextual responses.

Architecture Overview

The RAG pattern consists of three main components:

Data Ingestion: Process and index your documents
Retrieval: Search for relevant context
Generation: Use LLM to generate responses with context

Setting Up Azure AI Search

Create Search Index

var index = new SearchIndex("documents-index")
{
    Fields =
    {
        new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
        new SearchableField("content") { AnalyzerName = LexicalAnalyzerName.EnMicrosoft },
        new SearchableField("title"),
        new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
        {
            IsSearchable = true,
            VectorSearchDimensions = 1536,
            VectorSearchProfileName = "vector-profile"
        }
    },
    VectorSearch = new()
    {
        Profiles = { new VectorSearchProfile("vector-profile", "hnsw-config") },
        Algorithms = { new HnswAlgorithmConfiguration("hnsw-config") }
    }
};

Generate Embeddings

var embeddingClient = new OpenAIClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey)
);

var embeddingOptions = new EmbeddingsOptions("text-embedding-ada-002", new[] { text });
var embedding = await embeddingClient.GetEmbeddingsAsync(embeddingOptions);

Implementing RAG

Search for Context

public async Task<List<string>> SearchAsync(string query, float[] queryVector)
{
    var searchOptions = new SearchOptions
    {
        VectorSearch = new()
        {
            Queries = { new VectorizedQuery(queryVector) { KNearestNeighborsCount = 5, Fields = { "contentVector" } } }
        },
        Size = 5,
        Select = { "content", "title" }
    };

    var results = await _searchClient.SearchAsync<SearchDocument>(query, searchOptions);
    return results.Value.GetResults().Select(r => r.Document["content"].ToString()).ToList();
}

Generate Response

public async Task<string> GenerateResponseAsync(string question, List<string> context)
{
    var systemPrompt = $"""
        You are a helpful assistant. Answer questions based on the provided context.
        If you don't know the answer, say so.

        Context:
        {string.Join("\n\n", context)}
        """;

    var chatOptions = new ChatCompletionsOptions("gpt-4", new[]
    {
        new ChatRequestSystemMessage(systemPrompt),
        new ChatRequestUserMessage(question)
    });

    var response = await _openAIClient.GetChatCompletionsAsync(chatOptions);
    return response.Value.Choices[0].Message.Content;
}

Best Practices

Chunk Documents Properly: Use semantic chunking with overlap
Hybrid Search: Combine vector and keyword search
Reranking: Use semantic reranking for better results
Prompt Engineering: Craft effective system prompts
Monitor Quality: Track answer quality and relevance

Conclusion

RAG enables you to build AI applications that leverage your organization's knowledge while maintaining accuracy and reducing hallucinations.

Implementing RAG Pattern with Azure AI Search and OpenAI

Introduction

Architecture Overview

Setting Up Azure AI Search

Create Search Index

Generate Embeddings

Implementing RAG

Search for Context

Generate Response

Best Practices

Conclusion

Share this article

About the Author

Related Articles

How to Build Adaptive Dialog Management in Microsoft Copilot Studio

How to Build a Copilot Studio Agent From Scratch (Without the Mistakes)

Need help with your project?