Build Enterprise RAG Chatbot with Amazon Bedrock & AWS

Introduction: The Limitations of Generic LLMs

While ChatGPT has revolutionized how we interact with AI, enterprises face critical challenges when using generic large language models:

Outdated Knowledge: Models are trained on data up to a specific cutoff date
No Access to Proprietary Data: Cannot answer questions about your internal documents, policies, or databases
Hallucination Risk: Models may invent plausible-sounding but incorrect information
Security Concerns: Sensitive data exposure when using public APIs

The solution? Retrieval-Augmented Generation (RAG) - a technique that combines the power of LLMs with your proprietary data. In this comprehensive guide, we'll build a production-ready enterprise chatbot using AWS's managed services.

Architecture Overview

Here's what we're building:

Prerequisites

Before we begin, ensure you have:

AWS Account with appropriate permissions
Amazon Bedrock Access requested (go to Bedrock console → Model access)
Python 3.9+ and AWS CLI configured
Sample documents for testing (PDFs, Word docs, text files)

Step 1: Setting Up the Knowledge Base

1.1 Create an S3 Bucket for Your Documents
```
 # Create a unique bucket name
 BUCKET_NAME="enterprise-rag-documents-$(date +%s)"
 aws s3 mb s3://$BUCKET_NAME

 # Upload sample documents
 aws s3 cp ./documents/ s3://$BUCKET_NAME/ --recursive
```
1.2 Configure Amazon Bedrock Knowledge Base

Navigate to Amazon Bedrock → Knowledge Bases → Create Knowledge Base

Configuration Parameters:
- Knowledge base name: enterprise-knowledge-base
- IAM role: Create new role with S3 and Bedrock permissions
- Data source: Your S3 bucket
- Embeddings model: amazon.titan-embed-text-v2 (default)
- Vector database: Choose Quick create a new vector store
- Advanced settings: Enable hybrid search for better results

    {
      "knowledgeBaseConfiguration": {
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
          "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2"
        }
      },
      "storageConfiguration": {
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration": {
          "collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/your-collection",
          "vectorIndexName": "enterprise-docs-index",
          "fieldMapping": {
            "vectorField": "embedding",
            "textField": "content",
            "metadataField": "metadata"
          }
        }
      },
      "dataSourceConfiguration": {
        "type": "S3",
        "s3Configuration": {
          "bucketArn": "arn:aws:s3:::your-documents-bucket",
          "inclusionPrefixes": ["documents/"]
        }
      }
    }

Step 2: Building the Backend Orchestrator

2.1 Create Lambda Function with Dependencies

Create a requirements.txt:

    boto3>=1.28.0
    aws-lambda-powertools>=2.0.0
    python-dotenv>=1.0.0

Create the Lambda function:

    # lambda_handler.py
    import json
    import boto3
    import os
    from typing import Dict, Any
    from botocore.exceptions import ClientError

    # Initialize AWS clients
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
    bedrock = boto3.client('bedrock-runtime')

    class RAGOrchestrator:
        def __init__(self, knowledge_base_id: str, model_id: str = "anthropic.claude-3-sonnet-20240229"):
            self.knowledge_base_id = knowledge_base_id
            self.model_id = model_id
            self.region = os.environ.get('AWS_REGION', 'us-east-1')

        def retrieve_context(self, query: str, max_results: int = 5) -> Dict[str, Any]:
            """Retrieve relevant context from knowledge base"""
            try:
                response = bedrock_agent_runtime.retrieve(
                    knowledgeBaseId=self.knowledge_base_id,
                    retrievalQuery={
                        'text': query
                    },
                    retrievalConfiguration={
                        'vectorSearchConfiguration': {
                            'numberOfResults': max_results,
                            'overrideSearchType': 'HYBRID'
                        }
                    }
                )

                # Extract and format retrieved passages
                contexts = []
                for result in response.get('retrievalResults', []):
                    contexts.append({
                        'content': result['content']['text'],
                        'metadata': result.get('metadata', {}),
                        'score': result.get('score', 0.0)
                    })

                return {
                    'contexts': contexts,
                    'total_results': len(contexts)
                }

            except ClientError as e:
                print(f"Error retrieving context: {e}")
                return {'contexts': [], 'total_results': 0}

        def generate_response(self, query: str, context: str) -> str:
            """Generate response using LLM with retrieved context"""

            # Prepare the prompt with context
            prompt = f"""Human: You are an expert assistant for our enterprise. Use the following context to answer the question.

            Context:
            {context}

            Question: {query}

            Instructions:
            1. Answer based ONLY on the provided context
            2. If the context doesn't contain relevant information, say "I don't have enough information to answer this question based on the available documents."
            3. Cite specific sources when possible
            4. Keep the response concise and professional

            Assistant:"""

            try:
                # For Claude models
                response = bedrock.invoke_model(
                    modelId=self.model_id,
                    body=json.dumps({
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 1000,
                        "messages": [
                            {
                                "role": "user",
                                "content": prompt
                            }
                        ]
                    }),
                    contentType='application/json'
                )

                response_body = json.loads(response['body'].read())
                return response_body['content'][0]['text']

            except ClientError as e:
                print(f"Error generating response: {e}")
                return "I apologize, but I'm having trouble generating a response at the moment."

    def lambda_handler(event, context):
        """Main Lambda handler"""

        # Extract query from event
        query = event.get('query', '').strip()
        if not query:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Query is required'})
            }

        # Initialize orchestrator
        knowledge_base_id = os.environ['KNOWLEDGE_BASE_ID']
        orchestrator = RAGOrchestrator(knowledge_base_id)

        # Step 1: Retrieve relevant context
        retrieval_result = orchestrator.retrieve_context(query)

        if retrieval_result['total_results'] == 0:
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'response': "I couldn't find relevant information in our knowledge base to answer your question.",
                    'sources': []
                })
            }

        # Combine retrieved contexts
        combined_context = "\n\n".join([
            f"Source {i+1}:\n{ctx['content']}\n[Metadata: {ctx['metadata']}]"
            for i, ctx in enumerate(retrieval_result['contexts'])
        ])

        # Step 2: Generate response using LLM
        response = orchestrator.generate_response(query, combined_context)

        # Prepare sources for citation
        sources = [
            {
                'content': ctx['content'][:200] + '...',  # Preview
                'metadata': ctx['metadata'],
                'relevance_score': ctx['score']
            }
            for ctx in retrieval_result['contexts']
        ]

        return {
            'statusCode': 200,
            'body': json.dumps({
                'response': response,
                'sources': sources,
                'retrieved_context_count': retrieval_result['total_results']
            })
        }

2.2 Deploy with AWS SAM (Optional)

Create a template.yaml for easy deployment:

    AWSTemplateFormatVersion: '2010-09-09'
    Transform: AWS::Serverless-2016-10-31
    Description: Enterprise RAG Chatbot

    Resources:
      RagChatbotFunction:
        Type: AWS::Serverless::Function
        Properties:
          CodeUri: lambda/
          Handler: lambda_handler.lambda_handler
          Runtime: python3.9
          Timeout: 30
          MemorySize: 512
          Environment:
            Variables:
              KNOWLEDGE_BASE_ID: !Ref KnowledgeBaseId
          Policies:
            - BedrockKnowledgeBasePolicy:
                KnowledgeBaseId: !Ref KnowledgeBaseId
            - S3ReadPolicy:
                BucketName: !Ref DocumentBucket
          Events:
            ApiEvent:
              Type: Api
              Properties:
                Path: /query
                Method: post

      DocumentBucket:
        Type: AWS::S3::Bucket
        Properties:
          BucketName: !Sub enterprise-docs-${AWS::AccountId}

    Outputs:
      ApiEndpoint:
        Description: "API Gateway endpoint URL"
        Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/query"

Step 3: Creating a Simple Web Interface

Create a basic React frontend (index.html):

    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Enterprise RAG Chatbot</title>
        <script src="https://cdn.tailwindcss.com"></script>
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
    </head>
    <body class="bg-gray-50 min-h-screen">
        <div class="container mx-auto px-4 py-8 max-w-4xl">
            <header class="mb-8">
                <h1 class="text-3xl font-bold text-gray-800 mb-2">
                    <i class="fas fa-robot mr-3 text-blue-500"></i>
                    Enterprise Knowledge Assistant
                </h1>
                <p class="text-gray-600">Ask questions about your company documents, policies, and procedures.</p>
            </header>

            <div class="bg-white rounded-lg shadow-lg p-6 mb-6">
                <div id="chat-container" class="h-96 overflow-y-auto mb-4 p-4 border rounded-lg bg-gray-50">
                    <div class="text-center text-gray-500 py-8">
                        <i class="fas fa-comments text-3xl mb-3"></i>
                        <p>Start a conversation by typing your question below.</p>
                    </div>
                </div>

                <div class="flex space-x-4">
                    <input 
                        type="text" 
                        id="query-input" 
                        placeholder="Ask about company policies, procedures, or documents..." 
                        class="flex-grow p-3 border rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-blue-500 outline-none"
                    >
                    <button 
                        id="send-btn" 
                        class="bg-blue-500 text-white px-6 py-3 rounded-lg hover:bg-blue-600 transition font-semibold"
                    >
                        <i class="fas fa-paper-plane mr-2"></i>Ask
                    </button>
                </div>

                <div class="mt-4 text-sm text-gray-500">
                    <p><i class="fas fa-info-circle mr-1"></i> This chatbot searches through all company documents to find accurate answers.</p>
                </div>
            </div>

            <div id="sources-panel" class="bg-white rounded-lg shadow-lg p-6 hidden">
                <h3 class="text-lg font-semibold mb-4 text-gray-700">
                    <i class="fas fa-file-alt mr-2"></i>Sources Used
                </h3>
                <div id="sources-list"></div>
            </div>
        </div>

        <script>
            const API_ENDPOINT = 'YOUR_API_GATEWAY_ENDPOINT'; // Replace with your endpoint

            document.getElementById('send-btn').addEventListener('click', sendQuery);
            document.getElementById('query-input').addEventListener('keypress', (e) => {
                if (e.key === 'Enter') sendQuery();
            });

            async function sendQuery() {
                const queryInput = document.getElementById('query-input');
                const query = queryInput.value.trim();

                if (!query) return;

                // Add user message to chat
                addMessage(query, 'user');
                queryInput.value = '';

                // Show typing indicator
                const typingId = showTypingIndicator();

                try {
                    const response = await fetch(API_ENDPOINT, {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json',
                        },
                        body: JSON.stringify({ query: query })
                    });

                    const data = await response.json();

                    // Remove typing indicator
                    removeTypingIndicator(typingId);

                    // Add AI response
                    addMessage(data.response, 'ai');

                    // Show sources if available
                    if (data.sources && data.sources.length > 0) {
                        showSources(data.sources);
                    }

                } catch (error) {
                    console.error('Error:', error);
                    removeTypingIndicator(typingId);
                    addMessage('Sorry, there was an error processing your request.', 'ai');
                }
            }

            function addMessage(content, sender) {
                const chatContainer = document.getElementById('chat-container');

                const messageDiv = document.createElement('div');
                messageDiv.className = `mb-4 ${sender === 'user' ? 'text-right' : ''}`;

                const bubble = document.createElement('div');
                bubble.className = `inline-block p-4 rounded-lg max-w-xs md:max-w-md ${
                    sender === 'user' 
                        ? 'bg-blue-500 text-white rounded-br-none' 
                        : 'bg-gray-200 text-gray-800 rounded-bl-none'
                }`;

                bubble.innerHTML = `<p class="whitespace-pre-wrap">${content}</p>`;

                messageDiv.appendChild(bubble);
                chatContainer.appendChild(messageDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
            }

            function showTypingIndicator() {
                const chatContainer = document.getElementById('chat-container');
                const typingDiv = document.createElement('div');
                typingDiv.id = 'typing-indicator';
                typingDiv.className = 'mb-4';
                typingDiv.innerHTML = `
                    <div class="inline-block p-4 rounded-lg bg-gray-200 rounded-bl-none">
                        <div class="flex space-x-1">
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce"></div>
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.2s"></div>
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.4s"></div>
                        </div>
                    </div>
                `;
                chatContainer.appendChild(typingDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
                return 'typing-indicator';
            }

            function removeTypingIndicator(id) {
                const indicator = document.getElementById(id);
                if (indicator) indicator.remove();
            }

            function showSources(sources) {
                const sourcesPanel = document.getElementById('sources-panel');
                const sourcesList = document.getElementById('sources-list');

                sourcesPanel.classList.remove('hidden');
                sourcesList.innerHTML = '';

                sources.forEach((source, index) => {
                    const sourceDiv = document.createElement('div');
                    sourceDiv.className = 'mb-3 p-3 border rounded-lg hover:bg-gray-50';
                    sourceDiv.innerHTML = `
                        <div class="flex justify-between items-start">
                            <h4 class="font-medium text-gray-800">Source ${index + 1}</h4>
                            <span class="text-xs bg-blue-100 text-blue-800 px-2 py-1 rounded">Score: ${source.relevance_score.toFixed(3)}</span>
                        </div>
                        <p class="text-sm text-gray-600 mt-2">${source.content}</p>
                        <div class="text-xs text-gray-500 mt-2">
                            <i class="fas fa-tag mr-1"></i>${JSON.stringify(source.metadata)}
                        </div>
                    `;
                    sourcesList.appendChild(sourceDiv);
                });
            }
        </script>
    </body>
    </html>

Step 4: Advanced Features & Optimization

4.1 Implementing Conversation Memory

Add a DynamoDB table for conversation history:

    # Add to your Lambda function
    import boto3
    from datetime import datetime

    dynamodb = boto3.resource('dynamodb')
    conversation_table = dynamodb.Table('RAGConversations')

    class ConversationManager:
        def __init__(self, session_id):
            self.session_id = session_id

        def save_interaction(self, query: str, response: str, sources: list):
            timestamp = datetime.utcnow().isoformat()

            conversation_table.put_item(
                Item={
                    'session_id': self.session_id,
                    'timestamp': timestamp,
                    'query': query,
                    'response': response,
                    'sources': sources,
                    'ttl': int(datetime.utcnow().timestamp()) + 86400  # 24-hour TTL
                }
            )

        def get_conversation_history(self, limit: int = 5):
            response = conversation_table.query(
                KeyConditionExpression='session_id = :sid',
                ScanIndexForward=False,
                Limit=limit
            )
            return response.get('Items', [])

4.2 Adding Document-Level Access Control

Implement metadata filtering based on user roles:

    def retrieve_with_access_control(query: str, user_roles: list):
        # Add metadata filter based on user roles
        filter_conditions = {
            'andAll': [
                {
                    'equals': {
                        'key': 'allowed_roles',
                        'value': user_role
                    }
                }
                for user_role in user_roles
            ]
        }

        response = bedrock_agent_runtime.retrieve(
            knowledgeBaseId=knowledge_base_id,
            retrievalQuery={'text': query},
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'filter': filter_conditions,
                    'numberOfResults': 5
                }
            }
        )
        return response

Step 5: Testing & Validation

Test Cases to Validate Your RAG System:

    test_cases = [
        {
            "query": "What is our vacation policy for senior employees?",
            "expected_characteristics": ["should cite HR documents", "mention specific vacation days"]
        },
        {
            "query": "How do I submit an expense report?",
            "expected_characteristics": ["mention the expense portal", "provide step-by-step instructions"]
        },
        {
            "query": "What was our Q3 revenue?",
            "expected_characteristics": ["cite financial reports", "provide specific numbers"]
        }
    ]

    # Evaluation metrics to track:
    # 1. Response Relevance (0-5 scale)
    # 2. Citation Accuracy (are sources actually relevant?)
    # 3. Hallucination Rate (percentage of made-up information)
    # 4. Response Time (should be under 5 seconds)

Cost Estimation & Optimization

Monthly Cost Breakdown (Estimated):

Amazon Bedrock (Claude 3 Sonnet): ~$3 per 1M input tokens
OpenSearch Serverless: ~$0.30 per OCU-hour (1 OCU = ~$720/month)
Lambda: ~$0.20 per million requests (128MB, 3s average)
S3: ~$0.023 per GB storage

Cost Optimization Tips:

Use caching: Cache frequent queries in DynamoDB
Implement query optimization: Use query rewriting to improve retrieval
Monitor usage: Set up CloudWatch alarms for cost thresholds
Consider smaller models: Use Claude Haiku for simpler queries

Best Practices for Production

Data Pipeline Management:
- Automate document ingestion with S3 Event Notifications
- Implement data quality checks before indexing
- Schedule regular knowledge base synchronization
Security:
- Encrypt data at rest (S3 SSE-S3/SSE-KMS)
- Implement API authentication (Cognito, API Keys)
- Use VPC endpoints for private access
- Enable Bedrock guardrails for content filtering
Monitoring:
- Track retrieval hit/miss rates
- Monitor response latency (95th percentile < 2s)
- Set up user feedback collection (thumbs up/down)
- Log all queries for compliance
Performance Tuning:
- Experiment with different embedding models
- Adjust chunking strategy (size, overlap)
- Implement query expansion techniques
- Use metadata filtering for better precision

Common Pitfalls & Solutions

Pitfall	Solution
Poor retrieval quality	Implement hybrid search, adjust chunk sizes, add metadata filtering
Hallucinations	Add strict prompt instructions, implement confidence scoring
Slow response times	Add caching, optimize Lambda memory, use async processing
Irrelevant sources	Fine-tune embedding model, improve document preprocessing

Conclusion

Building an enterprise RAG chatbot with Amazon Bedrock provides a powerful, scalable solution for making proprietary data accessible through natural language. The managed services approach significantly reduces operational overhead while providing enterprise-grade security and reliability.

Key Advantages of This Architecture:

✅ No infrastructure management - Fully managed by AWS
✅ Enterprise security - Private, compliant, and secure
✅ Scalable - Handles from 10 to 10,000 queries per second
✅ Cost-effective - Pay-per-use pricing model
✅ Accurate - Grounded in your actual documents

Next Steps for Your Implementation:

Start with a pilot department (e.g., HR or IT documentation)
Collect user feedback and iterate on prompt engineering
Implement advanced features like multi-modal support (images, tables)
Consider fine-tuning embeddings on your domain-specific data
Explore integration with existing systems (SharePoint, Confluence, Salesforce)

Resources:

Need help implementing this? Have questions about specific use cases? Leave a comment below or reach out on Me here .

Ready to deploy? Use the AWS CloudFormation template below for a one-click deployment:

    # Save as rag-chatbot-cfn.yaml
    # Deploy with: aws cloudformation create-stack --stack-name enterprise-rag-chatbot --template-body file://rag-chatbot-cfn.yaml

Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases

Introduction: The Limitations of Generic LLMs

Architecture Overview

Prerequisites

Step 1: Setting Up the Knowledge Base

1.1 Create an S3 Bucket for Your Documents

1.2 Configure Amazon Bedrock Knowledge Base

Step 2: Building the Backend Orchestrator

2.1 Create Lambda Function with Dependencies

2.2 Deploy with AWS SAM (Optional)

Step 3: Creating a Simple Web Interface

Step 4: Advanced Features & Optimization

4.1 Implementing Conversation Memory

4.2 Adding Document-Level Access Control

Step 5: Testing & Validation

Test Cases to Validate Your RAG System:

Cost Estimation & Optimization

Best Practices for Production

Common Pitfalls & Solutions

Conclusion

Comments

AWS Cloud Content

How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

More from this blog

How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

The DevOps Roadmap: A Guide to Becoming a DevOps Engineer Professional

The AWS Well-Architected Framework: 6 pillars of successful architectures.

Choose between Amazon RDS and AWS EC2.

Command Palette

Introduction: The Limitations of Generic LLMs

Architecture Overview

Prerequisites

Step 1: Setting Up the Knowledge Base

1.1 Create an S3 Bucket for Your Documents

1.2 Configure Amazon Bedrock Knowledge Base

Step 2: Building the Backend Orchestrator

2.1 Create Lambda Function with Dependencies

2.2 Deploy with AWS SAM (Optional)

Step 3: Creating a Simple Web Interface

Step 4: Advanced Features & Optimization

4.1 Implementing Conversation Memory

4.2 Adding Document-Level Access Control

Step 5: Testing & Validation

Test Cases to Validate Your RAG System:

Cost Estimation & Optimization

Best Practices for Production

Common Pitfalls & Solutions

Conclusion

Comments

AWS Cloud Content

How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

More from this blog