Skip to main content

Command Palette

Search for a command to run...

Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases

Updated
10 min read
Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases
O

Software Engineer / AWS Cloud Engineer/ DevOps Engineer/ Team Lead

Introduction: The Limitations of Generic LLMs

While ChatGPT has revolutionized how we interact with AI, enterprises face critical challenges when using generic large language models:

  1. Outdated Knowledge: Models are trained on data up to a specific cutoff date

  2. No Access to Proprietary Data: Cannot answer questions about your internal documents, policies, or databases

  3. Hallucination Risk: Models may invent plausible-sounding but incorrect information

  4. Security Concerns: Sensitive data exposure when using public APIs

The solution? Retrieval-Augmented Generation (RAG) - a technique that combines the power of LLMs with your proprietary data. In this comprehensive guide, we'll build a production-ready enterprise chatbot using AWS's managed services.

Architecture Overview

Here's what we're building:

Prerequisites

Before we begin, ensure you have:

  1. AWS Account with appropriate permissions

  2. Amazon Bedrock Access requested (go to Bedrock console → Model access)

  3. Python 3.9+ and AWS CLI configured

  4. Sample documents for testing (PDFs, Word docs, text files)

    Step 1: Setting Up the Knowledge Base

    1.1 Create an S3 Bucket for Your Documents

     # Create a unique bucket name
     BUCKET_NAME="enterprise-rag-documents-$(date +%s)"
     aws s3 mb s3://$BUCKET_NAME
    
     # Upload sample documents
     aws s3 cp ./documents/ s3://$BUCKET_NAME/ --recursive
    

    1.2 Configure Amazon Bedrock Knowledge Base

    Navigate to Amazon Bedrock → Knowledge Bases → Create Knowledge Base

    Configuration Parameters:

    • Knowledge base name: enterprise-knowledge-base

    • IAM role: Create new role with S3 and Bedrock permissions

    • Data source: Your S3 bucket

    • Embeddings model: amazon.titan-embed-text-v2 (default)

    • Vector database: Choose Quick create a new vector store

    • Advanced settings: Enable hybrid search for better results

    {
      "knowledgeBaseConfiguration": {
        "type": "VECTOR",
        "vectorKnowledgeBaseConfiguration": {
          "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2"
        }
      },
      "storageConfiguration": {
        "type": "OPENSEARCH_SERVERLESS",
        "opensearchServerlessConfiguration": {
          "collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/your-collection",
          "vectorIndexName": "enterprise-docs-index",
          "fieldMapping": {
            "vectorField": "embedding",
            "textField": "content",
            "metadataField": "metadata"
          }
        }
      },
      "dataSourceConfiguration": {
        "type": "S3",
        "s3Configuration": {
          "bucketArn": "arn:aws:s3:::your-documents-bucket",
          "inclusionPrefixes": ["documents/"]
        }
      }
    }

Step 2: Building the Backend Orchestrator

2.1 Create Lambda Function with Dependencies

Create a requirements.txt:

    boto3>=1.28.0
    aws-lambda-powertools>=2.0.0
    python-dotenv>=1.0.0

Create the Lambda function:

    # lambda_handler.py
    import json
    import boto3
    import os
    from typing import Dict, Any
    from botocore.exceptions import ClientError

    # Initialize AWS clients
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
    bedrock = boto3.client('bedrock-runtime')

    class RAGOrchestrator:
        def __init__(self, knowledge_base_id: str, model_id: str = "anthropic.claude-3-sonnet-20240229"):
            self.knowledge_base_id = knowledge_base_id
            self.model_id = model_id
            self.region = os.environ.get('AWS_REGION', 'us-east-1')

        def retrieve_context(self, query: str, max_results: int = 5) -> Dict[str, Any]:
            """Retrieve relevant context from knowledge base"""
            try:
                response = bedrock_agent_runtime.retrieve(
                    knowledgeBaseId=self.knowledge_base_id,
                    retrievalQuery={
                        'text': query
                    },
                    retrievalConfiguration={
                        'vectorSearchConfiguration': {
                            'numberOfResults': max_results,
                            'overrideSearchType': 'HYBRID'
                        }
                    }
                )

                # Extract and format retrieved passages
                contexts = []
                for result in response.get('retrievalResults', []):
                    contexts.append({
                        'content': result['content']['text'],
                        'metadata': result.get('metadata', {}),
                        'score': result.get('score', 0.0)
                    })

                return {
                    'contexts': contexts,
                    'total_results': len(contexts)
                }

            except ClientError as e:
                print(f"Error retrieving context: {e}")
                return {'contexts': [], 'total_results': 0}

        def generate_response(self, query: str, context: str) -> str:
            """Generate response using LLM with retrieved context"""

            # Prepare the prompt with context
            prompt = f"""Human: You are an expert assistant for our enterprise. Use the following context to answer the question.

            Context:
            {context}

            Question: {query}

            Instructions:
            1. Answer based ONLY on the provided context
            2. If the context doesn't contain relevant information, say "I don't have enough information to answer this question based on the available documents."
            3. Cite specific sources when possible
            4. Keep the response concise and professional

            Assistant:"""

            try:
                # For Claude models
                response = bedrock.invoke_model(
                    modelId=self.model_id,
                    body=json.dumps({
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 1000,
                        "messages": [
                            {
                                "role": "user",
                                "content": prompt
                            }
                        ]
                    }),
                    contentType='application/json'
                )

                response_body = json.loads(response['body'].read())
                return response_body['content'][0]['text']

            except ClientError as e:
                print(f"Error generating response: {e}")
                return "I apologize, but I'm having trouble generating a response at the moment."

    def lambda_handler(event, context):
        """Main Lambda handler"""

        # Extract query from event
        query = event.get('query', '').strip()
        if not query:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Query is required'})
            }

        # Initialize orchestrator
        knowledge_base_id = os.environ['KNOWLEDGE_BASE_ID']
        orchestrator = RAGOrchestrator(knowledge_base_id)

        # Step 1: Retrieve relevant context
        retrieval_result = orchestrator.retrieve_context(query)

        if retrieval_result['total_results'] == 0:
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'response': "I couldn't find relevant information in our knowledge base to answer your question.",
                    'sources': []
                })
            }

        # Combine retrieved contexts
        combined_context = "\n\n".join([
            f"Source {i+1}:\n{ctx['content']}\n[Metadata: {ctx['metadata']}]"
            for i, ctx in enumerate(retrieval_result['contexts'])
        ])

        # Step 2: Generate response using LLM
        response = orchestrator.generate_response(query, combined_context)

        # Prepare sources for citation
        sources = [
            {
                'content': ctx['content'][:200] + '...',  # Preview
                'metadata': ctx['metadata'],
                'relevance_score': ctx['score']
            }
            for ctx in retrieval_result['contexts']
        ]

        return {
            'statusCode': 200,
            'body': json.dumps({
                'response': response,
                'sources': sources,
                'retrieved_context_count': retrieval_result['total_results']
            })
        }

2.2 Deploy with AWS SAM (Optional)

Create a template.yaml for easy deployment:

    AWSTemplateFormatVersion: '2010-09-09'
    Transform: AWS::Serverless-2016-10-31
    Description: Enterprise RAG Chatbot

    Resources:
      RagChatbotFunction:
        Type: AWS::Serverless::Function
        Properties:
          CodeUri: lambda/
          Handler: lambda_handler.lambda_handler
          Runtime: python3.9
          Timeout: 30
          MemorySize: 512
          Environment:
            Variables:
              KNOWLEDGE_BASE_ID: !Ref KnowledgeBaseId
          Policies:
            - BedrockKnowledgeBasePolicy:
                KnowledgeBaseId: !Ref KnowledgeBaseId
            - S3ReadPolicy:
                BucketName: !Ref DocumentBucket
          Events:
            ApiEvent:
              Type: Api
              Properties:
                Path: /query
                Method: post

      DocumentBucket:
        Type: AWS::S3::Bucket
        Properties:
          BucketName: !Sub enterprise-docs-${AWS::AccountId}

    Outputs:
      ApiEndpoint:
        Description: "API Gateway endpoint URL"
        Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/query"

Step 3: Creating a Simple Web Interface

Create a basic React frontend (index.html):

    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Enterprise RAG Chatbot</title>
        <script src="https://cdn.tailwindcss.com"></script>
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
    </head>
    <body class="bg-gray-50 min-h-screen">
        <div class="container mx-auto px-4 py-8 max-w-4xl">
            <header class="mb-8">
                <h1 class="text-3xl font-bold text-gray-800 mb-2">
                    <i class="fas fa-robot mr-3 text-blue-500"></i>
                    Enterprise Knowledge Assistant
                </h1>
                <p class="text-gray-600">Ask questions about your company documents, policies, and procedures.</p>
            </header>

            <div class="bg-white rounded-lg shadow-lg p-6 mb-6">
                <div id="chat-container" class="h-96 overflow-y-auto mb-4 p-4 border rounded-lg bg-gray-50">
                    <div class="text-center text-gray-500 py-8">
                        <i class="fas fa-comments text-3xl mb-3"></i>
                        <p>Start a conversation by typing your question below.</p>
                    </div>
                </div>

                <div class="flex space-x-4">
                    <input 
                        type="text" 
                        id="query-input" 
                        placeholder="Ask about company policies, procedures, or documents..." 
                        class="flex-grow p-3 border rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-blue-500 outline-none"
                    >
                    <button 
                        id="send-btn" 
                        class="bg-blue-500 text-white px-6 py-3 rounded-lg hover:bg-blue-600 transition font-semibold"
                    >
                        <i class="fas fa-paper-plane mr-2"></i>Ask
                    </button>
                </div>

                <div class="mt-4 text-sm text-gray-500">
                    <p><i class="fas fa-info-circle mr-1"></i> This chatbot searches through all company documents to find accurate answers.</p>
                </div>
            </div>

            <div id="sources-panel" class="bg-white rounded-lg shadow-lg p-6 hidden">
                <h3 class="text-lg font-semibold mb-4 text-gray-700">
                    <i class="fas fa-file-alt mr-2"></i>Sources Used
                </h3>
                <div id="sources-list"></div>
            </div>
        </div>

        <script>
            const API_ENDPOINT = 'YOUR_API_GATEWAY_ENDPOINT'; // Replace with your endpoint

            document.getElementById('send-btn').addEventListener('click', sendQuery);
            document.getElementById('query-input').addEventListener('keypress', (e) => {
                if (e.key === 'Enter') sendQuery();
            });

            async function sendQuery() {
                const queryInput = document.getElementById('query-input');
                const query = queryInput.value.trim();

                if (!query) return;

                // Add user message to chat
                addMessage(query, 'user');
                queryInput.value = '';

                // Show typing indicator
                const typingId = showTypingIndicator();

                try {
                    const response = await fetch(API_ENDPOINT, {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json',
                        },
                        body: JSON.stringify({ query: query })
                    });

                    const data = await response.json();

                    // Remove typing indicator
                    removeTypingIndicator(typingId);

                    // Add AI response
                    addMessage(data.response, 'ai');

                    // Show sources if available
                    if (data.sources && data.sources.length > 0) {
                        showSources(data.sources);
                    }

                } catch (error) {
                    console.error('Error:', error);
                    removeTypingIndicator(typingId);
                    addMessage('Sorry, there was an error processing your request.', 'ai');
                }
            }

            function addMessage(content, sender) {
                const chatContainer = document.getElementById('chat-container');

                const messageDiv = document.createElement('div');
                messageDiv.className = `mb-4 ${sender === 'user' ? 'text-right' : ''}`;

                const bubble = document.createElement('div');
                bubble.className = `inline-block p-4 rounded-lg max-w-xs md:max-w-md ${
                    sender === 'user' 
                        ? 'bg-blue-500 text-white rounded-br-none' 
                        : 'bg-gray-200 text-gray-800 rounded-bl-none'
                }`;

                bubble.innerHTML = `<p class="whitespace-pre-wrap">${content}</p>`;

                messageDiv.appendChild(bubble);
                chatContainer.appendChild(messageDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
            }

            function showTypingIndicator() {
                const chatContainer = document.getElementById('chat-container');
                const typingDiv = document.createElement('div');
                typingDiv.id = 'typing-indicator';
                typingDiv.className = 'mb-4';
                typingDiv.innerHTML = `
                    <div class="inline-block p-4 rounded-lg bg-gray-200 rounded-bl-none">
                        <div class="flex space-x-1">
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce"></div>
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.2s"></div>
                            <div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.4s"></div>
                        </div>
                    </div>
                `;
                chatContainer.appendChild(typingDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
                return 'typing-indicator';
            }

            function removeTypingIndicator(id) {
                const indicator = document.getElementById(id);
                if (indicator) indicator.remove();
            }

            function showSources(sources) {
                const sourcesPanel = document.getElementById('sources-panel');
                const sourcesList = document.getElementById('sources-list');

                sourcesPanel.classList.remove('hidden');
                sourcesList.innerHTML = '';

                sources.forEach((source, index) => {
                    const sourceDiv = document.createElement('div');
                    sourceDiv.className = 'mb-3 p-3 border rounded-lg hover:bg-gray-50';
                    sourceDiv.innerHTML = `
                        <div class="flex justify-between items-start">
                            <h4 class="font-medium text-gray-800">Source ${index + 1}</h4>
                            <span class="text-xs bg-blue-100 text-blue-800 px-2 py-1 rounded">Score: ${source.relevance_score.toFixed(3)}</span>
                        </div>
                        <p class="text-sm text-gray-600 mt-2">${source.content}</p>
                        <div class="text-xs text-gray-500 mt-2">
                            <i class="fas fa-tag mr-1"></i>${JSON.stringify(source.metadata)}
                        </div>
                    `;
                    sourcesList.appendChild(sourceDiv);
                });
            }
        </script>
    </body>
    </html>

Step 4: Advanced Features & Optimization

4.1 Implementing Conversation Memory

Add a DynamoDB table for conversation history:

    # Add to your Lambda function
    import boto3
    from datetime import datetime

    dynamodb = boto3.resource('dynamodb')
    conversation_table = dynamodb.Table('RAGConversations')

    class ConversationManager:
        def __init__(self, session_id):
            self.session_id = session_id

        def save_interaction(self, query: str, response: str, sources: list):
            timestamp = datetime.utcnow().isoformat()

            conversation_table.put_item(
                Item={
                    'session_id': self.session_id,
                    'timestamp': timestamp,
                    'query': query,
                    'response': response,
                    'sources': sources,
                    'ttl': int(datetime.utcnow().timestamp()) + 86400  # 24-hour TTL
                }
            )

        def get_conversation_history(self, limit: int = 5):
            response = conversation_table.query(
                KeyConditionExpression='session_id = :sid',
                ScanIndexForward=False,
                Limit=limit
            )
            return response.get('Items', [])

4.2 Adding Document-Level Access Control

Implement metadata filtering based on user roles:

    def retrieve_with_access_control(query: str, user_roles: list):
        # Add metadata filter based on user roles
        filter_conditions = {
            'andAll': [
                {
                    'equals': {
                        'key': 'allowed_roles',
                        'value': user_role
                    }
                }
                for user_role in user_roles
            ]
        }

        response = bedrock_agent_runtime.retrieve(
            knowledgeBaseId=knowledge_base_id,
            retrievalQuery={'text': query},
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'filter': filter_conditions,
                    'numberOfResults': 5
                }
            }
        )
        return response

Step 5: Testing & Validation

Test Cases to Validate Your RAG System:

    test_cases = [
        {
            "query": "What is our vacation policy for senior employees?",
            "expected_characteristics": ["should cite HR documents", "mention specific vacation days"]
        },
        {
            "query": "How do I submit an expense report?",
            "expected_characteristics": ["mention the expense portal", "provide step-by-step instructions"]
        },
        {
            "query": "What was our Q3 revenue?",
            "expected_characteristics": ["cite financial reports", "provide specific numbers"]
        }
    ]

    # Evaluation metrics to track:
    # 1. Response Relevance (0-5 scale)
    # 2. Citation Accuracy (are sources actually relevant?)
    # 3. Hallucination Rate (percentage of made-up information)
    # 4. Response Time (should be under 5 seconds)

Cost Estimation & Optimization

Monthly Cost Breakdown (Estimated):

  • Amazon Bedrock (Claude 3 Sonnet): ~$3 per 1M input tokens

  • OpenSearch Serverless: ~$0.30 per OCU-hour (1 OCU = ~$720/month)

  • Lambda: ~$0.20 per million requests (128MB, 3s average)

  • S3: ~$0.023 per GB storage

Cost Optimization Tips:

  1. Use caching: Cache frequent queries in DynamoDB

  2. Implement query optimization: Use query rewriting to improve retrieval

  3. Monitor usage: Set up CloudWatch alarms for cost thresholds

  4. Consider smaller models: Use Claude Haiku for simpler queries

Best Practices for Production

  1. Data Pipeline Management:

    • Automate document ingestion with S3 Event Notifications

    • Implement data quality checks before indexing

    • Schedule regular knowledge base synchronization

  2. Security:

    • Encrypt data at rest (S3 SSE-S3/SSE-KMS)

    • Implement API authentication (Cognito, API Keys)

    • Use VPC endpoints for private access

    • Enable Bedrock guardrails for content filtering

  3. Monitoring:

    • Track retrieval hit/miss rates

    • Monitor response latency (95th percentile < 2s)

    • Set up user feedback collection (thumbs up/down)

    • Log all queries for compliance

  4. Performance Tuning:

    • Experiment with different embedding models

    • Adjust chunking strategy (size, overlap)

    • Implement query expansion techniques

    • Use metadata filtering for better precision

Common Pitfalls & Solutions

PitfallSolution
Poor retrieval qualityImplement hybrid search, adjust chunk sizes, add metadata filtering
HallucinationsAdd strict prompt instructions, implement confidence scoring
Slow response timesAdd caching, optimize Lambda memory, use async processing
Irrelevant sourcesFine-tune embedding model, improve document preprocessing

Conclusion

Building an enterprise RAG chatbot with Amazon Bedrock provides a powerful, scalable solution for making proprietary data accessible through natural language. The managed services approach significantly reduces operational overhead while providing enterprise-grade security and reliability.

Key Advantages of This Architecture:

  • No infrastructure management - Fully managed by AWS

  • Enterprise security - Private, compliant, and secure

  • Scalable - Handles from 10 to 10,000 queries per second

  • Cost-effective - Pay-per-use pricing model

  • Accurate - Grounded in your actual documents

Next Steps for Your Implementation:

  1. Start with a pilot department (e.g., HR or IT documentation)

  2. Collect user feedback and iterate on prompt engineering

  3. Implement advanced features like multi-modal support (images, tables)

  4. Consider fine-tuning embeddings on your domain-specific data

  5. Explore integration with existing systems (SharePoint, Confluence, Salesforce)


Resources:

Need help implementing this? Have questions about specific use cases? Leave a comment below or reach out on Me here .


Ready to deploy? Use the AWS CloudFormation template below for a one-click deployment:

    # Save as rag-chatbot-cfn.yaml
    # Deploy with: aws cloudformation create-stack --stack-name enterprise-rag-chatbot --template-body file://rag-chatbot-cfn.yaml