Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases

Introduction: The Limitations of Generic LLMs
While ChatGPT has revolutionized how we interact with AI, enterprises face critical challenges when using generic large language models:
Outdated Knowledge: Models are trained on data up to a specific cutoff date
No Access to Proprietary Data: Cannot answer questions about your internal documents, policies, or databases
Hallucination Risk: Models may invent plausible-sounding but incorrect information
Security Concerns: Sensitive data exposure when using public APIs
The solution? Retrieval-Augmented Generation (RAG) - a technique that combines the power of LLMs with your proprietary data. In this comprehensive guide, we'll build a production-ready enterprise chatbot using AWS's managed services.
Architecture Overview
Here's what we're building:

Prerequisites
Before we begin, ensure you have:
AWS Account with appropriate permissions
Amazon Bedrock Access requested (go to Bedrock console → Model access)
Python 3.9+ and AWS CLI configured
Sample documents for testing (PDFs, Word docs, text files)

Step 1: Setting Up the Knowledge Base
1.1 Create an S3 Bucket for Your Documents
# Create a unique bucket name BUCKET_NAME="enterprise-rag-documents-$(date +%s)" aws s3 mb s3://$BUCKET_NAME # Upload sample documents aws s3 cp ./documents/ s3://$BUCKET_NAME/ --recursive1.2 Configure Amazon Bedrock Knowledge Base
Navigate to Amazon Bedrock → Knowledge Bases → Create Knowledge Base
Configuration Parameters:
Knowledge base name:
enterprise-knowledge-baseIAM role: Create new role with S3 and Bedrock permissions
Data source: Your S3 bucket
Embeddings model:
amazon.titan-embed-text-v2(default)Vector database: Choose
Quick create a new vector storeAdvanced settings: Enable hybrid search for better results
{
"knowledgeBaseConfiguration": {
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2"
}
},
"storageConfiguration": {
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": "arn:aws:aoss:us-east-1:123456789012:collection/your-collection",
"vectorIndexName": "enterprise-docs-index",
"fieldMapping": {
"vectorField": "embedding",
"textField": "content",
"metadataField": "metadata"
}
}
},
"dataSourceConfiguration": {
"type": "S3",
"s3Configuration": {
"bucketArn": "arn:aws:s3:::your-documents-bucket",
"inclusionPrefixes": ["documents/"]
}
}
}
Step 2: Building the Backend Orchestrator
2.1 Create Lambda Function with Dependencies
Create a requirements.txt:
boto3>=1.28.0
aws-lambda-powertools>=2.0.0
python-dotenv>=1.0.0
Create the Lambda function:
# lambda_handler.py
import json
import boto3
import os
from typing import Dict, Any
from botocore.exceptions import ClientError
# Initialize AWS clients
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
bedrock = boto3.client('bedrock-runtime')
class RAGOrchestrator:
def __init__(self, knowledge_base_id: str, model_id: str = "anthropic.claude-3-sonnet-20240229"):
self.knowledge_base_id = knowledge_base_id
self.model_id = model_id
self.region = os.environ.get('AWS_REGION', 'us-east-1')
def retrieve_context(self, query: str, max_results: int = 5) -> Dict[str, Any]:
"""Retrieve relevant context from knowledge base"""
try:
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=self.knowledge_base_id,
retrievalQuery={
'text': query
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': max_results,
'overrideSearchType': 'HYBRID'
}
}
)
# Extract and format retrieved passages
contexts = []
for result in response.get('retrievalResults', []):
contexts.append({
'content': result['content']['text'],
'metadata': result.get('metadata', {}),
'score': result.get('score', 0.0)
})
return {
'contexts': contexts,
'total_results': len(contexts)
}
except ClientError as e:
print(f"Error retrieving context: {e}")
return {'contexts': [], 'total_results': 0}
def generate_response(self, query: str, context: str) -> str:
"""Generate response using LLM with retrieved context"""
# Prepare the prompt with context
prompt = f"""Human: You are an expert assistant for our enterprise. Use the following context to answer the question.
Context:
{context}
Question: {query}
Instructions:
1. Answer based ONLY on the provided context
2. If the context doesn't contain relevant information, say "I don't have enough information to answer this question based on the available documents."
3. Cite specific sources when possible
4. Keep the response concise and professional
Assistant:"""
try:
# For Claude models
response = bedrock.invoke_model(
modelId=self.model_id,
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [
{
"role": "user",
"content": prompt
}
]
}),
contentType='application/json'
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
except ClientError as e:
print(f"Error generating response: {e}")
return "I apologize, but I'm having trouble generating a response at the moment."
def lambda_handler(event, context):
"""Main Lambda handler"""
# Extract query from event
query = event.get('query', '').strip()
if not query:
return {
'statusCode': 400,
'body': json.dumps({'error': 'Query is required'})
}
# Initialize orchestrator
knowledge_base_id = os.environ['KNOWLEDGE_BASE_ID']
orchestrator = RAGOrchestrator(knowledge_base_id)
# Step 1: Retrieve relevant context
retrieval_result = orchestrator.retrieve_context(query)
if retrieval_result['total_results'] == 0:
return {
'statusCode': 200,
'body': json.dumps({
'response': "I couldn't find relevant information in our knowledge base to answer your question.",
'sources': []
})
}
# Combine retrieved contexts
combined_context = "\n\n".join([
f"Source {i+1}:\n{ctx['content']}\n[Metadata: {ctx['metadata']}]"
for i, ctx in enumerate(retrieval_result['contexts'])
])
# Step 2: Generate response using LLM
response = orchestrator.generate_response(query, combined_context)
# Prepare sources for citation
sources = [
{
'content': ctx['content'][:200] + '...', # Preview
'metadata': ctx['metadata'],
'relevance_score': ctx['score']
}
for ctx in retrieval_result['contexts']
]
return {
'statusCode': 200,
'body': json.dumps({
'response': response,
'sources': sources,
'retrieved_context_count': retrieval_result['total_results']
})
}
2.2 Deploy with AWS SAM (Optional)
Create a template.yaml for easy deployment:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Enterprise RAG Chatbot
Resources:
RagChatbotFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambda/
Handler: lambda_handler.lambda_handler
Runtime: python3.9
Timeout: 30
MemorySize: 512
Environment:
Variables:
KNOWLEDGE_BASE_ID: !Ref KnowledgeBaseId
Policies:
- BedrockKnowledgeBasePolicy:
KnowledgeBaseId: !Ref KnowledgeBaseId
- S3ReadPolicy:
BucketName: !Ref DocumentBucket
Events:
ApiEvent:
Type: Api
Properties:
Path: /query
Method: post
DocumentBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub enterprise-docs-${AWS::AccountId}
Outputs:
ApiEndpoint:
Description: "API Gateway endpoint URL"
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/query"
Step 3: Creating a Simple Web Interface
Create a basic React frontend (index.html):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Enterprise RAG Chatbot</title>
<script src="https://cdn.tailwindcss.com"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
</head>
<body class="bg-gray-50 min-h-screen">
<div class="container mx-auto px-4 py-8 max-w-4xl">
<header class="mb-8">
<h1 class="text-3xl font-bold text-gray-800 mb-2">
<i class="fas fa-robot mr-3 text-blue-500"></i>
Enterprise Knowledge Assistant
</h1>
<p class="text-gray-600">Ask questions about your company documents, policies, and procedures.</p>
</header>
<div class="bg-white rounded-lg shadow-lg p-6 mb-6">
<div id="chat-container" class="h-96 overflow-y-auto mb-4 p-4 border rounded-lg bg-gray-50">
<div class="text-center text-gray-500 py-8">
<i class="fas fa-comments text-3xl mb-3"></i>
<p>Start a conversation by typing your question below.</p>
</div>
</div>
<div class="flex space-x-4">
<input
type="text"
id="query-input"
placeholder="Ask about company policies, procedures, or documents..."
class="flex-grow p-3 border rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-blue-500 outline-none"
>
<button
id="send-btn"
class="bg-blue-500 text-white px-6 py-3 rounded-lg hover:bg-blue-600 transition font-semibold"
>
<i class="fas fa-paper-plane mr-2"></i>Ask
</button>
</div>
<div class="mt-4 text-sm text-gray-500">
<p><i class="fas fa-info-circle mr-1"></i> This chatbot searches through all company documents to find accurate answers.</p>
</div>
</div>
<div id="sources-panel" class="bg-white rounded-lg shadow-lg p-6 hidden">
<h3 class="text-lg font-semibold mb-4 text-gray-700">
<i class="fas fa-file-alt mr-2"></i>Sources Used
</h3>
<div id="sources-list"></div>
</div>
</div>
<script>
const API_ENDPOINT = 'YOUR_API_GATEWAY_ENDPOINT'; // Replace with your endpoint
document.getElementById('send-btn').addEventListener('click', sendQuery);
document.getElementById('query-input').addEventListener('keypress', (e) => {
if (e.key === 'Enter') sendQuery();
});
async function sendQuery() {
const queryInput = document.getElementById('query-input');
const query = queryInput.value.trim();
if (!query) return;
// Add user message to chat
addMessage(query, 'user');
queryInput.value = '';
// Show typing indicator
const typingId = showTypingIndicator();
try {
const response = await fetch(API_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ query: query })
});
const data = await response.json();
// Remove typing indicator
removeTypingIndicator(typingId);
// Add AI response
addMessage(data.response, 'ai');
// Show sources if available
if (data.sources && data.sources.length > 0) {
showSources(data.sources);
}
} catch (error) {
console.error('Error:', error);
removeTypingIndicator(typingId);
addMessage('Sorry, there was an error processing your request.', 'ai');
}
}
function addMessage(content, sender) {
const chatContainer = document.getElementById('chat-container');
const messageDiv = document.createElement('div');
messageDiv.className = `mb-4 ${sender === 'user' ? 'text-right' : ''}`;
const bubble = document.createElement('div');
bubble.className = `inline-block p-4 rounded-lg max-w-xs md:max-w-md ${
sender === 'user'
? 'bg-blue-500 text-white rounded-br-none'
: 'bg-gray-200 text-gray-800 rounded-bl-none'
}`;
bubble.innerHTML = `<p class="whitespace-pre-wrap">${content}</p>`;
messageDiv.appendChild(bubble);
chatContainer.appendChild(messageDiv);
chatContainer.scrollTop = chatContainer.scrollHeight;
}
function showTypingIndicator() {
const chatContainer = document.getElementById('chat-container');
const typingDiv = document.createElement('div');
typingDiv.id = 'typing-indicator';
typingDiv.className = 'mb-4';
typingDiv.innerHTML = `
<div class="inline-block p-4 rounded-lg bg-gray-200 rounded-bl-none">
<div class="flex space-x-1">
<div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce"></div>
<div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.2s"></div>
<div class="w-2 h-2 bg-gray-500 rounded-full animate-bounce" style="animation-delay: 0.4s"></div>
</div>
</div>
`;
chatContainer.appendChild(typingDiv);
chatContainer.scrollTop = chatContainer.scrollHeight;
return 'typing-indicator';
}
function removeTypingIndicator(id) {
const indicator = document.getElementById(id);
if (indicator) indicator.remove();
}
function showSources(sources) {
const sourcesPanel = document.getElementById('sources-panel');
const sourcesList = document.getElementById('sources-list');
sourcesPanel.classList.remove('hidden');
sourcesList.innerHTML = '';
sources.forEach((source, index) => {
const sourceDiv = document.createElement('div');
sourceDiv.className = 'mb-3 p-3 border rounded-lg hover:bg-gray-50';
sourceDiv.innerHTML = `
<div class="flex justify-between items-start">
<h4 class="font-medium text-gray-800">Source ${index + 1}</h4>
<span class="text-xs bg-blue-100 text-blue-800 px-2 py-1 rounded">Score: ${source.relevance_score.toFixed(3)}</span>
</div>
<p class="text-sm text-gray-600 mt-2">${source.content}</p>
<div class="text-xs text-gray-500 mt-2">
<i class="fas fa-tag mr-1"></i>${JSON.stringify(source.metadata)}
</div>
`;
sourcesList.appendChild(sourceDiv);
});
}
</script>
</body>
</html>
Step 4: Advanced Features & Optimization
4.1 Implementing Conversation Memory
Add a DynamoDB table for conversation history:
# Add to your Lambda function
import boto3
from datetime import datetime
dynamodb = boto3.resource('dynamodb')
conversation_table = dynamodb.Table('RAGConversations')
class ConversationManager:
def __init__(self, session_id):
self.session_id = session_id
def save_interaction(self, query: str, response: str, sources: list):
timestamp = datetime.utcnow().isoformat()
conversation_table.put_item(
Item={
'session_id': self.session_id,
'timestamp': timestamp,
'query': query,
'response': response,
'sources': sources,
'ttl': int(datetime.utcnow().timestamp()) + 86400 # 24-hour TTL
}
)
def get_conversation_history(self, limit: int = 5):
response = conversation_table.query(
KeyConditionExpression='session_id = :sid',
ScanIndexForward=False,
Limit=limit
)
return response.get('Items', [])
4.2 Adding Document-Level Access Control
Implement metadata filtering based on user roles:
def retrieve_with_access_control(query: str, user_roles: list):
# Add metadata filter based on user roles
filter_conditions = {
'andAll': [
{
'equals': {
'key': 'allowed_roles',
'value': user_role
}
}
for user_role in user_roles
]
}
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=knowledge_base_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'filter': filter_conditions,
'numberOfResults': 5
}
}
)
return response
Step 5: Testing & Validation
Test Cases to Validate Your RAG System:
test_cases = [
{
"query": "What is our vacation policy for senior employees?",
"expected_characteristics": ["should cite HR documents", "mention specific vacation days"]
},
{
"query": "How do I submit an expense report?",
"expected_characteristics": ["mention the expense portal", "provide step-by-step instructions"]
},
{
"query": "What was our Q3 revenue?",
"expected_characteristics": ["cite financial reports", "provide specific numbers"]
}
]
# Evaluation metrics to track:
# 1. Response Relevance (0-5 scale)
# 2. Citation Accuracy (are sources actually relevant?)
# 3. Hallucination Rate (percentage of made-up information)
# 4. Response Time (should be under 5 seconds)
Cost Estimation & Optimization
Monthly Cost Breakdown (Estimated):
Amazon Bedrock (Claude 3 Sonnet): ~$3 per 1M input tokens
OpenSearch Serverless: ~$0.30 per OCU-hour (1 OCU = ~$720/month)
Lambda: ~$0.20 per million requests (128MB, 3s average)
S3: ~$0.023 per GB storage
Cost Optimization Tips:
Use caching: Cache frequent queries in DynamoDB
Implement query optimization: Use query rewriting to improve retrieval
Monitor usage: Set up CloudWatch alarms for cost thresholds
Consider smaller models: Use Claude Haiku for simpler queries
Best Practices for Production
Data Pipeline Management:
Automate document ingestion with S3 Event Notifications
Implement data quality checks before indexing
Schedule regular knowledge base synchronization
Security:
Encrypt data at rest (S3 SSE-S3/SSE-KMS)
Implement API authentication (Cognito, API Keys)
Use VPC endpoints for private access
Enable Bedrock guardrails for content filtering
Monitoring:
Track retrieval hit/miss rates
Monitor response latency (95th percentile < 2s)
Set up user feedback collection (thumbs up/down)
Log all queries for compliance
Performance Tuning:
Experiment with different embedding models
Adjust chunking strategy (size, overlap)
Implement query expansion techniques
Use metadata filtering for better precision
Common Pitfalls & Solutions
| Pitfall | Solution |
| Poor retrieval quality | Implement hybrid search, adjust chunk sizes, add metadata filtering |
| Hallucinations | Add strict prompt instructions, implement confidence scoring |
| Slow response times | Add caching, optimize Lambda memory, use async processing |
| Irrelevant sources | Fine-tune embedding model, improve document preprocessing |
Conclusion
Building an enterprise RAG chatbot with Amazon Bedrock provides a powerful, scalable solution for making proprietary data accessible through natural language. The managed services approach significantly reduces operational overhead while providing enterprise-grade security and reliability.
Key Advantages of This Architecture:
✅ No infrastructure management - Fully managed by AWS
✅ Enterprise security - Private, compliant, and secure
✅ Scalable - Handles from 10 to 10,000 queries per second
✅ Cost-effective - Pay-per-use pricing model
✅ Accurate - Grounded in your actual documents
Next Steps for Your Implementation:
Start with a pilot department (e.g., HR or IT documentation)
Collect user feedback and iterate on prompt engineering
Implement advanced features like multi-modal support (images, tables)
Consider fine-tuning embeddings on your domain-specific data
Explore integration with existing systems (SharePoint, Confluence, Salesforce)
Resources:
Need help implementing this? Have questions about specific use cases? Leave a comment below or reach out on Me here .
Ready to deploy? Use the AWS CloudFormation template below for a one-click deployment:
# Save as rag-chatbot-cfn.yaml
# Deploy with: aws cloudformation create-stack --stack-name enterprise-rag-chatbot --template-body file://rag-chatbot-cfn.yaml




