Implementing RAG Pattern with Azure AI Search and OpenAI
Back to BlogsAzure AI

Implementing RAG Pattern with Azure AI Search and OpenAI

Khawar HabibDecember 20, 202315 min read1 view

Learn how to build a Retrieval-Augmented Generation system using Azure AI Search and Azure OpenAI for accurate, context-aware AI responses.

Introduction

Retrieval-Augmented Generation (RAG) combines the power of large language models with your own data. By using Azure AI Search as a knowledge base and Azure OpenAI for generation, you can build AI systems that provide accurate, contextual responses.

Architecture Overview

The RAG pattern consists of three main components:

  1. Data Ingestion: Process and index your documents
  2. Retrieval: Search for relevant context
  3. Generation: Use LLM to generate responses with context

Setting Up Azure AI Search

Create Search Index

var index = new SearchIndex("documents-index")
{
    Fields =
    {
        new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
        new SearchableField("content") { AnalyzerName = LexicalAnalyzerName.EnMicrosoft },
        new SearchableField("title"),
        new SearchField("contentVector", SearchFieldDataType.Collection(SearchFieldDataType.Single))
        {
            IsSearchable = true,
            VectorSearchDimensions = 1536,
            VectorSearchProfileName = "vector-profile"
        }
    },
    VectorSearch = new()
    {
        Profiles = { new VectorSearchProfile("vector-profile", "hnsw-config") },
        Algorithms = { new HnswAlgorithmConfiguration("hnsw-config") }
    }
};

Generate Embeddings

var embeddingClient = new OpenAIClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey)
);

var embeddingOptions = new EmbeddingsOptions("text-embedding-ada-002", new[] { text });
var embedding = await embeddingClient.GetEmbeddingsAsync(embeddingOptions);

Implementing RAG

Search for Context

public async Task<List<string>> SearchAsync(string query, float[] queryVector)
{
    var searchOptions = new SearchOptions
    {
        VectorSearch = new()
        {
            Queries = { new VectorizedQuery(queryVector) { KNearestNeighborsCount = 5, Fields = { "contentVector" } } }
        },
        Size = 5,
        Select = { "content", "title" }
    };

    var results = await _searchClient.SearchAsync<SearchDocument>(query, searchOptions);
    return results.Value.GetResults().Select(r => r.Document["content"].ToString()).ToList();
}

Generate Response

public async Task<string> GenerateResponseAsync(string question, List<string> context)
{
    var systemPrompt = $"""
        You are a helpful assistant. Answer questions based on the provided context.
        If you don't know the answer, say so.

        Context:
        {string.Join("\n\n", context)}
        """;

    var chatOptions = new ChatCompletionsOptions("gpt-4", new[]
    {
        new ChatRequestSystemMessage(systemPrompt),
        new ChatRequestUserMessage(question)
    });

    var response = await _openAIClient.GetChatCompletionsAsync(chatOptions);
    return response.Value.Choices[0].Message.Content;
}

Best Practices

  1. Chunk Documents Properly: Use semantic chunking with overlap
  2. Hybrid Search: Combine vector and keyword search
  3. Reranking: Use semantic reranking for better results
  4. Prompt Engineering: Craft effective system prompts
  5. Monitor Quality: Track answer quality and relevance

Conclusion

RAG enables you to build AI applications that leverage your organization's knowledge while maintaining accuracy and reducing hallucinations.

RAGAzure AI SearchOpenAIAIVector Search

Share this article

About the Author

KH

Khawar Habib

Microsoft MVP | AI Engineer

Software & AI Engineer specializing in Microsoft Azure, .NET, and cutting-edge AI technologies.

Need help with your project?

Let's discuss how I can help bring your ideas to life.

Get In Touch