Build Enterprise RAG Chatbot with Amazon Bedrock

Phase 1: Setup & Preparation (30-45 minutes)

Step 1: AWS Account & Permissions Setup

1.1 Login to AWS Console

Go to https://aws.amazon.com and sign in
If new, create account (has free tier but will need payment method)

1.2 Create IAM User for SageMaker (Don't use root!)

Go to IAM Service
Click "Users" → "Create user"
Username: sagemaker-user
Select "Attach policies directly"
Add these policies:
- AmazonSageMakerFullAccess
- AmazonS3FullAccess
- AWSCloudFormationFullAccess
- IAMFullAccess (temporarily, for setup)
Click "Create user"
Go to "Security credentials" tab
Click "Create access key"
Select "Command Line Interface (CLI)"

Copy the Access Key ID and Secret Access Key

1.3 Configure AWS CLI on Your Machine

# Install AWS CLI (if not installed)
# For Mac:
brew install awscli
# For Ubuntu:
sudo apt-get install awscli
# For Windows (PowerShell):
winget install -e --id Amazon.AWSCLI

# Configure AWS CLI
aws configure
# Enter:
# AWS Access Key ID: [paste from step above]
# AWS Secret Access Key: [paste from step above]
# Default region: us-east-1 (or your preferred region)
# Default output format: json

1.4 Configure AWS CLI on Your Machine

Step 2: Request Model Access

2.1 Get Llama 3 Access on Hugging Face

# 1. Go to https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
# 2. Click "Request Access"
# 3. Fill the form (use your real details)
# 4. Wait for approval (usually within hours)

# Alternative: Use a different open model that doesn't require approval
# We'll use "mistralai/Mistral-7B-Instruct-v0.1" for this tutorial
# No approval needed!

2.2 Create Hugging Face Token (For Llama 3 if approved)

1. Go to https://huggingface.co
2. Sign up/login
3. Click profile → Settings → Access Tokens
4. Click "New token"
5. Name: aws-sagemaker
6. Role: Write (for uploading models if needed)
7. Copy the token

Step 3: Prepare Your Local Environment

3.1 Create Project Directory Structure

bash

mkdir llama3-finetune-tutorial
cd llama3-finetune-tutorial

# Create the directory structure
mkdir -p scripts data configs outputs
mkdir -p docker train deploy monitor

3.2 Create Virtual Environment & Install Dependencies

# Create virtual environment
python -m venv venv

# Activate it
# On Mac/Linux:
source venv/bin/activate

# On Windows:
# venv\Scripts\activate

# Install required packages
pip install --upgrade pip
pip install boto3 sagemaker awscli
pip install transformers==4.36.0
pip install datasets==2.14.0
pip install peft==0.7.0
pip install accelerate==0.25.0
pip install bitsandbytes==0.41.3
pip install torch==2.1.0
pip install scikit-learn

Step 4: Prepare Training Data

4.1 Create Sample Training Data

# Create data directory
cd data

Create prepare_data.py:

# data/prepare_data.py
import json
import pandas as pd

def create_sample_data():
    """Create example training data for customer service chatbot"""

    examples = [
        {
            "instruction": "How do I reset my account password?",
            "input": "",
            "output": "To reset your password, visit our login page and click 'Forgot Password'. Enter your email address and check your inbox for a reset link. The link expires in 24 hours. If you don't receive the email, check your spam folder or contact support@ourcompany.com."
        },
        {
            "instruction": "What's your refund policy?",
            "input": "Customer purchased item 2 weeks ago",
            "output": "We offer a 30-day money-back guarantee for all products. Since your purchase was 2 weeks ago, you're eligible for a full refund. Please go to 'My Orders' in your account, select the item, and click 'Request Refund'. Refunds are processed within 5-7 business days."
        },
        {
            "instruction": "Do you offer international shipping?",
            "input": "",
            "output": "Yes, we ship to over 50 countries. International shipping takes 7-14 business days and costs $25. Additional customs fees may apply depending on your country's regulations. You can see exact shipping costs at checkout."
        },
        {
            "instruction": "How do I contact customer support?",
            "input": "",
            "output": "You can contact our support team in three ways: 1) Email: support@ourcompany.com (response within 24 hours), 2) Live Chat: Available Mon-Fri 9AM-6PM EST on our website, 3) Phone: 1-800-555-HELP (Mon-Fri 8AM-8PM EST)."
        },
        {
            "instruction": "What payment methods do you accept?",
            "input": "Customer is from Europe",
            "output": "We accept Visa, MasterCard, American Express, PayPal, and Apple Pay. For European customers, we also support SEPA transfers and iDEAL. All payments are processed securely through PCI-compliant systems."
        }
    ]

    # Create 100 synthetic examples (for demo - in real life, use real data)
    all_examples = []
    base_scenarios = examples.copy()

    for i in range(100):
        base = base_scenarios[i % len(base_scenarios)]
        new_example = base.copy()

        # Add some variation
        if "password" in new_example["instruction"].lower():
            variations = [
                "I forgot my password",
                "Can't login to my account",
                "Need to change my password"
            ]
            new_example["instruction"] = variations[i % len(variations)]

        # Format for training
        text = f"### Instruction:\n{new_example['instruction']}\n\n"
        if new_example['input']:
            text += f"### Input:\n{new_example['input']}\n\n"
        text += f"### Response:\n{new_example['output']}"

        all_examples.append({"text": text})

    # Save to JSON
    with open('train.json', 'w') as f:
        json.dump(all_examples, f, indent=2)

    # Also save in instruction format
    instruction_examples = []
    for ex in all_examples:
        lines = ex['text'].split('\n')
        instruction = lines[0].replace('### Instruction:', '').strip()
        response = lines[-1].replace('### Response:', '').strip()
        instruction_examples.append({
            "instruction": instruction,
            "response": response
        })

    with open('instructions.json', 'w') as f:
        json.dump(instruction_examples, f, indent=2)

    print(f"Created {len(all_examples)} training examples")
    print(f"Sample: {all_examples[0]['text'][:200]}...")

    return all_examples

if __name__ == "__main__":
    create_sample_data()

Run it:

python prepare_data.py

4.2 Create Validation Data
Create validation.json:

[
  {
    "text": "### Instruction:\nHow do I track my order?\n\n### Response:\nYou can track your order by logging into your account and going to 'Order History'. Click on the order number to see tracking details. You'll receive tracking emails at every major shipment milestone. For urgent inquiries, contact support@ourcompany.com."
  },
  {
    "text": "### Instruction:\nDo you have a mobile app?\n\n### Input:\nCustomer uses iPhone\n\n### Response:\nYes, we have both iOS and Android apps. You can download our iOS app from the App Store by searching 'OurCompany'. The app includes all website features plus push notifications for order updates and exclusive mobile-only deals."
  }
]

Phase 2: SageMaker Setup (20 minutes)

Step 5: Create S3 Bucket for Data & Models

5.1 Create Bucket

# Create unique bucket name (must be globally unique)
BUCKET_NAME="llama3-finetune-$(date +%s)-$RANDOM"
echo "Bucket name: $BUCKET_NAME"

# Create bucket
aws s3 mb s3://$BUCKET_NAME

# Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key data/train/
aws s3api put-object --bucket $BUCKET_NAME --key data/validation/
aws s3api put-object --bucket $BUCKET_NAME --key models/
aws s3api put-object --bucket $BUCKET_NAME --key outputs/

5.2 Upload Data to S3

# Upload training data
aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/train.json
aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/validation.json

# Verify upload
aws s3 ls s3://$BUCKET_NAME/data/train/
aws s3 ls s3://$BUCKET_NAME/data/validation/

Step 6: Create SageMaker Training Script

Create scripts/train.py:

#!/usr/bin/env python3
# scripts/train.py

import os
import sys
import json
import torch
import logging
from pathlib import Path

# Add project root to path
sys.path.append(str(Path(__file__).parent.parent))

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset, Dataset
import numpy as np

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class LLMTrainer:
    def __init__(self, config_path="configs/training_config.json"):
        """Initialize trainer with configuration"""
        with open(config_path, 'r') as f:
            self.config = json.load(f)

        logger.info(f"Configuration loaded: {self.config}")

        # Set device
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        logger.info(f"Using device: {self.device}")

    def load_model_and_tokenizer(self):
        """Load base model and tokenizer"""
        logger.info(f"Loading model: {self.config['model_name']}")

        # Configure 4-bit quantization to save memory
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True
        )

        # Load model with quantization
        self.model = AutoModelForCausalLM.from_pretrained(
            self.config["model_name"],
            quantization_config=bnb_config,
            device_map="auto",
            trust_remote_code=True,
            use_auth_token=True if "llama" in self.config["model_name"].lower() else False
        )

        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.config["model_name"],
            trust_remote_code=True,
            use_auth_token=True if "llama" in self.config["model_name"].lower() else False
        )

        # Set padding token
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        logger.info(f"Model loaded: {self.config['model_name']}")
        logger.info(f"Tokenizer vocab size: {len(self.tokenizer)}")

    def prepare_model_for_training(self):
        """Apply LoRA configuration to model"""
        logger.info("Preparing model for LoRA training...")

        # Prepare model for k-bit training
        self.model = prepare_model_for_kbit_training(self.model)

        # Configure LoRA
        lora_config = LoraConfig(
            r=self.config["lora_r"],
            lora_alpha=self.config["lora_alpha"],
            target_modules=self.config["lora_target_modules"],
            lora_dropout=self.config["lora_dropout"],
            bias="none",
            task_type="CAUSAL_LM"
        )

        # Apply LoRA
        self.model = get_peft_model(self.model, lora_config)

        # Print trainable parameters
        self.model.print_trainable_parameters()

    def load_and_tokenize_data(self):
        """Load and tokenize training data"""
        logger.info("Loading training data...")

        # Get data paths from environment (SageMaker sets these)
        train_data_path = os.environ.get('SM_CHANNEL_TRAIN', 'data/train')
        val_data_path = os.environ.get('SM_CHANNEL_VALIDATION', 'data/validation')

        logger.info(f"Train data path: {train_data_path}")
        logger.info(f"Validation data path: {val_data_path}")

        # Load datasets
        train_files = [str(f) for f in Path(train_data_path).glob("*.json")]
        val_files = [str(f) for f in Path(val_data_path).glob("*.json")]

        train_dataset = load_dataset('json', data_files=train_files)
        val_dataset = load_dataset('json', data_files=val_files) if val_files else None

        # Tokenization function
        def tokenize_function(examples):
            return self.tokenizer(
                examples["text"],
                truncation=True,
                padding="max_length",
                max_length=self.config["max_length"]
            )

        # Tokenize datasets
        tokenized_train = train_dataset.map(
            tokenize_function,
            batched=True,
            remove_columns=train_dataset["train"].column_names
        )

        if val_dataset:
            tokenized_val = val_dataset.map(
                tokenize_function,
                batched=True,
                remove_columns=val_dataset["train"].column_names
            )
        else:
            tokenized_val = None

        logger.info(f"Training samples: {len(tokenized_train['train'])}")
        if tokenized_val:
            logger.info(f"Validation samples: {len(tokenized_val['train'])}")

        return tokenized_train["train"], tokenized_val["train"] if tokenized_val else None

    def train(self):
        """Main training loop"""
        logger.info("Starting training process...")

        # Load model and tokenizer
        self.load_model_and_tokenizer()

        # Prepare for LoRA training
        self.prepare_model_for_training()

        # Load and tokenize data
        train_dataset, val_dataset = self.load_and_tokenize_data()

        # Create data collator
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False
        )

        # Set output directory
        output_dir = "/opt/ml/model"  # SageMaker expects this

        # Configure training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=self.config["num_epochs"],
            per_device_train_batch_size=self.config["batch_size"],
            per_device_eval_batch_size=self.config["batch_size"],
            gradient_accumulation_steps=self.config["gradient_accumulation_steps"],
            warmup_steps=self.config["warmup_steps"],
            logging_steps=self.config["logging_steps"],
            save_steps=self.config["save_steps"],
            eval_steps=self.config["eval_steps"] if val_dataset else None,
            evaluation_strategy="steps" if val_dataset else "no",
            save_strategy="steps",
            save_total_limit=2,
            load_best_model_at_end=True if val_dataset else False,
            metric_for_best_model="eval_loss" if val_dataset else None,
            greater_is_better=False if val_dataset else None,
            learning_rate=self.config["learning_rate"],
            weight_decay=self.config["weight_decay"],
            fp16=False,
            bf16=self.config.get("bf16", False),
            gradient_checkpointing=self.config["gradient_checkpointing"],
            optim=self.config["optimizer"],
            report_to=["tensorboard"],
            ddp_find_unused_parameters=False,
            remove_unused_columns=False
        )

        # Initialize Trainer
        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
        )

        # Start training
        logger.info("Training started...")
        train_result = trainer.train()

        # Save model
        trainer.save_model()
        self.tokenizer.save_pretrained(output_dir)

        # Save training metrics
        metrics = train_result.metrics
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)

        if val_dataset:
            eval_metrics = trainer.evaluate()
            trainer.log_metrics("eval", eval_metrics)
            trainer.save_metrics("eval", eval_metrics)

        logger.info(f"Training completed! Model saved to {output_dir}")

        return metrics

def main():
    """Main entry point"""
    try:
        # Check if running in SageMaker
        sm_training_env = os.environ.get('SM_TRAINING_ENV', '')
        if sm_training_env:
            logger.info(f"Running in SageMaker environment: {sm_training_env}")

        # Initialize and run trainer
        trainer = LLMTrainer()
        metrics = trainer.train()

        logger.info("Training completed successfully!")
        logger.info(f"Final metrics: {metrics}")

    except Exception as e:
        logger.error(f"Training failed with error: {str(e)}")
        raise

if __name__ == "__main__":
    main()

Create configs/training_config.json:

{
  "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
  "num_epochs": 3,
  "batch_size": 2,
  "gradient_accumulation_steps": 4,
  "learning_rate": 2e-4,
  "weight_decay": 0.01,
  "warmup_steps": 100,
  "logging_steps": 50,
  "save_steps": 100,
  "eval_steps": 100,
  "max_length": 512,
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "lora_target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
  "gradient_checkpointing": true,
  "bf16": true,
  "optimizer": "adamw_8bit"
}

Create scripts/requirements.txt:

transformers==4.36.0
datasets==2.14.0
accelerate==0.25.0
peft==0.7.0
bitsandbytes==0.41.3
torch==2.1.0
scikit-learn
sentencepiece
protobuf
einops

Step 7: Create SageMaker Entry Point Script

Create scripts/sagemaker_entry.py:

#!/usr/bin/env python3
# scripts/sagemaker_entry.py

import os
import sys
import subprocess
import argparse

def install_requirements():
    """Install required packages"""
    print("Installing requirements...")
    subprocess.check_call([
        sys.executable, "-m", "pip", "install",
        "-r", "/opt/ml/code/requirements.txt"
    ])

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--train", 
        action="store_true",
        help="Run training"
    )
    parser.add_argument(
        "--serve", 
        action="store_true",
        help="Run serving"
    )

    args = parser.parse_args()

    if args.train:
        # Install dependencies first
        install_requirements()

        # Run training
        print("Starting training...")
        from train import main as train_main
        train_main()

    elif args.serve:
        print("Serving mode - this would load the model for inference")
        # For SageMaker deployment
        pass

if __name__ == "__main__":
    main()

Phase 3: Launch Training (15 minutes)

Step 8: Create Launch Script

Create launch_training.py:

#!/usr/bin/env python3
# launch_training.py

import os
import sys
import json
import boto3
import time
from datetime import datetime
from sagemaker.huggingface import HuggingFace, get_huggingface_llm_image_uri

def create_training_job():
    """Create and launch SageMaker training job"""

    # Configuration
    config = {
        "job_name": f"llama-finetune-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
        "instance_type": "ml.g5.2xlarge",  # Cheapest GPU with enough memory
        "instance_count": 1,
        "volume_size": 200,  # GB
        "max_run_hours": 4,
        "use_spot_instances": True,
        "max_wait_hours": 8,
        "bucket_name": "llama3-finetune-1234567890",  # Your bucket from earlier
        "role_arn": None,  # Will get from SageMaker
    }

    # Initialize session
    session = boto3.Session()
    sagemaker_session = boto3.Session().client('sagemaker')

    # Get SageMaker execution role
    if not config["role_arn"]:
        # Try to get default role
        try:
            iam = boto3.client('iam')
            roles = iam.list_roles(PathPrefix='/service-role/')
            for role in roles['Roles']:
                if 'AmazonSageMaker-ExecutionRole' in role['RoleName']:
                    config["role_arn"] = role['Arn']
                    break
        except:
            pass

        if not config["role_arn"]:
            print("No SageMaker role found. Creating one...")
            # You'll need to create this through AWS Console first
            print("Please create a SageMaker execution role:")
            print("1. Go to IAM Console")
            print("2. Create role")
            print("3. Select 'SageMaker' as use case")
            print("4. Attach policies: AmazonSageMakerFullAccess, AmazonS3FullAccess")
            print("5. Name: AmazonSageMaker-ExecutionRole")
            print("6. Copy the ARN and paste it below")
            config["role_arn"] = input("Enter SageMaker Execution Role ARN: ")

    # Create HuggingFace estimator
    print(f"Creating training job: {config['job_name']}")

    # Hyperparameters
    hyperparameters = {
        "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
        "num_epochs": "3",
        "batch_size": "2",
        "learning_rate": "2e-4",
        "lora_r": "16",
    }

    # Environment variables
    environment = {
        "HF_TOKEN": os.environ.get("HF_TOKEN", ""),  # For Llama 3 access
        "MODEL_CACHE": "/opt/ml/model",
    }

    # Create estimator
    estimator = HuggingFace(
        entry_point="sagemaker_entry.py",
        source_dir="scripts",
        instance_type=config["instance_type"],
        instance_count=config["instance_count"],
        volume_size=config["volume_size"],
        role=config["role_arn"],
        transformers_version="4.36.0",
        pytorch_version="2.1.0",
        py_version="py310",
        hyperparameters=hyperparameters,
        environment=environment,
        max_run=config["max_run_hours"] * 3600,
        use_spot_instances=config["use_spot_instances"],
        max_wait=config["max_wait_hours"] * 3600 if config["use_spot_instances"] else None,
        output_path=f"s3://{config['bucket_name']}/outputs/",
        code_location=f"s3://{config['bucket_name']}/code/",
        disable_profiler=True,
        debugger_hook_config=False,
    )

    # Define input data configuration
    inputs = {
        "train": f"s3://{config['bucket_name']}/data/train/",
        "validation": f"s3://{config['bucket_name']}/data/validation/",
    }

    # Launch training job
    print("Launching training job...")
    estimator.fit(inputs, job_name=config["job_name"], wait=False)

    # Get job details
    job_description = sagemaker_session.describe_training_job(
        TrainingJobName=config["job_name"]
    )

    print(f"\n✅ Training job launched successfully!")
    print(f"Job Name: {config['job_name']}")
    print(f"Job ARN: {job_description['TrainingJobArn']}")
    print(f"Instance: {config['instance_type']}")
    print(f"Spot Instances: {config['use_spot_instances']}")
    print(f"Estimated cost: ${estimate_cost(config['instance_type'], config['max_run_hours'])}")
    print(f"\nMonitor job at: https://{session.region_name}.console.aws.amazon.com/sagemaker/home?region={session.region_name}#/training-jobs/{config['job_name']}")

    return config["job_name"]

def estimate_cost(instance_type, hours):
    """Rough cost estimation"""
    pricing = {
        "ml.g5.2xlarge": 1.212,  # per hour
        "ml.g5.4xlarge": 2.176,
        "ml.g5.8xlarge": 4.352,
        "ml.g5.12xlarge": 6.528,
    }

    base_cost = pricing.get(instance_type, 1.5) * hours
    spot_cost = base_cost * 0.3  # ~70% discount for spot

    return round(spot_cost, 2)

def monitor_job(job_name):
    """Monitor training job progress"""
    client = boto3.client('sagemaker')

    print(f"\nMonitoring job: {job_name}")
    print("=" * 50)

    status = "InProgress"
    while status in ["InProgress", "Starting"]:
        try:
            response = client.describe_training_job(TrainingJobName=job_name)
            status = response['TrainingJobStatus']

            if 'TrainingStartTime' in response:
                elapsed = (time.time() - response['TrainingStartTime'].timestamp()) / 60
                print(f"Status: {status} | Elapsed: {elapsed:.1f} min", end='\r')

            if 'FinalMetricDataList' in response:
                for metric in response['FinalMetricDataList']:
                    print(f"{metric['MetricName']}: {metric['Value']}")

            time.sleep(30)

        except Exception as e:
            print(f"\nError monitoring: {e}")
            break

    print(f"\nFinal Status: {status}")

    if status == "Completed":
        print("✅ Training completed successfully!")
        print(f"Model artifacts: {response.get('ModelArtifacts', {}).get('S3ModelArtifacts', 'N/A')}")
    elif status == "Failed":
        print("❌ Training failed!")
        print(f"Failure reason: {response.get('FailureReason', 'Unknown')}")

    return status

def main():
    """Main function"""
    print("=" * 60)
    print("Llama 3 Fine-Tuning on SageMaker - Launch Script")
    print("=" * 60)

    # Step 1: Create training job
    job_name = create_training_job()

    # Step 2: Ask if user wants to monitor
    monitor = input("\nDo you want to monitor the job? (yes/no): ").lower()
    if monitor in ['yes', 'y']:
        monitor_job(job_name)

    # Step 3: Show next steps
    print("\n" + "=" * 60)
    print("NEXT STEPS:")
    print("=" * 60)
    print("1. Wait for training to complete (2-4 hours)")
    print("2. Check S3 for model artifacts:")
    print(f"   aws s3 ls s3://llama3-finetune-*/outputs/{job_name}/")
    print("3. Deploy the model:")
    print("   python deploy_model.py --job-name " + job_name)
    print("\nTo check status manually:")
    print(f"   aws sagemaker describe-training-job --training-job-name {job_name}")

if __name__ == "__main__":
    main()

Step 9: Run the Training!

# Make scripts executable
chmod +x launch_training.py
chmod +x scripts/*.py

# Run the launch script
python launch_training.py

# Or run directly with minimal setup
python -c "
import boto3
from sagemaker.huggingface import HuggingFace

# Quick start - minimal configuration
estimator = HuggingFace(
    entry_point='train.py',
    source_dir='scripts',
    instance_type='ml.g5.2xlarge',
    instance_count=1,
    role='your-sagemaker-role-arn',  # Replace with your role
    transformers_version='4.36',
    pytorch_version='2.1',
    py_version='py310',
    hyperparameters={
        'model_name': 'mistralai/Mistral-7B-Instruct-v0.1',
        'num_epochs': 1,  # Start with 1 epoch for testing
    }
)

# Start training
estimator.fit({
    'train': 's3://your-bucket/data/train/',
    'validation': 's3://your-bucket/data/validation/'
}, wait=True)
"

Phase 4: Monitor & Deploy (After Training Completes)

Step 10: Check Training Results

Create check_results.py:

#!/usr/bin/env python3
# check_results.py

import boto3
import json
from datetime import datetime

def check_training_job(job_name):
    """Check training job status and results"""
    client = boto3.client('sagemaker')

    try:
        response = client.describe_training_job(TrainingJobName=job_name)

        print(f"Job Name: {response['TrainingJobName']}")
        print(f"Status: {response['TrainingJobStatus']}")
        print(f"Creation Time: {response['CreationTime']}")

        if 'TrainingEndTime' in response:
            print(f"End Time: {response['TrainingEndTime']}")
            duration = (response['TrainingEndTime'] - response['TrainingStartTime']).total_seconds() / 3600
            print(f"Duration: {duration:.2f} hours")

        if 'ModelArtifacts' in response:
            print(f"\nModel Artifacts: {response['ModelArtifacts']['S3ModelArtifacts']}")

        if 'FinalMetricDataList' in response:
            print("\nFinal Metrics:")
            for metric in response['FinalMetricDataList']:
                print(f"  {metric['MetricName']}: {metric['Value']:.4f}")

        # Check for Spot training savings
        if response.get('EnableManagedSpotTraining', False):
            billable_time = response.get('BillableTimeInSeconds', 0)
            total_time = response.get('TrainingTimeInSeconds', 0)
            if total_time > 0:
                savings = (1 - (billable_time / total_time)) * 100
                print(f"\nSpot Training Savings: {savings:.1f}%")
                print(f"Billable time: {billable_time/3600:.1f}h")
                print(f"Total time: {total_time/3600:.1f}h")

        # Estimate cost
        instance_type = response['ResourceConfig']['InstanceType']
        duration_hours = response.get('TrainingTimeInSeconds', 0) / 3600

        # Rough pricing (varies by region)
        pricing = {
            'ml.g5.2xlarge': 1.212,
            'ml.g5.4xlarge': 2.176,
            'ml.g5.8xlarge': 4.352,
        }

        hourly_rate = pricing.get(instance_type, 1.5)
        cost = hourly_rate * duration_hours

        if response.get('EnableManagedSpotTraining', False):
            cost *= 0.3  # ~70% discount

        print(f"\nEstimated Cost: ${cost:.2f}")

        return response

    except Exception as e:
        print(f"Error: {e}")
        return None

def download_model(job_name, local_dir="model_output"):
    """Download trained model from S3"""
    import os
    from urllib.parse import urlparse
    import tarfile

    # Get model artifacts location
    client = boto3.client('sagemaker')
    response = client.describe_training_job(TrainingJobName=job_name)

    if 'ModelArtifacts' not in response:
        print("No model artifacts found")
        return None

    s3_path = response['ModelArtifacts']['S3ModelArtifacts']

    # Parse S3 URL
    parsed = urlparse(s3_path)
    bucket = parsed.netloc
    key = parsed.path.lstrip('/')

    # Create local directory
    os.makedirs(local_dir, exist_ok=True)

    # Download file
    local_file = os.path.join(local_dir, 'model.tar.gz')

    print(f"Downloading model from s3://{bucket}/{key}")
    print(f"To: {local_file}")

    s3 = boto3.client('s3')
    s3.download_file(bucket, key, local_file)

    # Extract if it's a tar file
    if local_file.endswith('.tar.gz'):
        print("Extracting model...")
        with tarfile.open(local_file, 'r:gz') as tar:
            tar.extractall(path=local_dir)

        # Remove tar file
        os.remove(local_file)

    print(f"Model downloaded to: {local_dir}")

    # List contents
    print("\nModel contents:")
    for root, dirs, files in os.walk(local_dir):
        for file in files[:10]:  # Show first 10 files
            print(f"  {os.path.join(root, file)}")

    return local_dir

if __name__ == "__main__":
    import sys

    if len(sys.argv) > 1:
        job_name = sys.argv[1]
    else:
        job_name = input("Enter training job name: ")

    print(f"Checking job: {job_name}")
    print("=" * 60)

    result = check_training_job(job_name)

    if result and result['TrainingJobStatus'] == 'Completed':
        download = input("\nDownload model? (yes/no): ").lower()
        if download in ['yes', 'y']:
            download_model(job_name)

Run it:

# After training completes
python check_results.py your-job-name-here

Step 11: Deploy the Model

Create deploy_model.py:

#!/usr/bin/env python3
# deploy_model.py

import boto3
import json
import time
from sagemaker.huggingface import HuggingFaceModel
from sagemaker import Session

def deploy_finetuned_model(job_name, endpoint_name=None):
    """Deploy the fine-tuned model to a SageMaker endpoint"""

    # Initialize
    session = Session()
    region = session.boto_region_name

    if not endpoint_name:
        endpoint_name = f"ft-{job_name[:30]}"  # Limit to 30 chars

    print(f"Deploying model from job: {job_name}")
    print(f"Endpoint name: {endpoint_name}")
    print(f"Region: {region}")

    # Get model artifacts location
    sm_client = boto3.client('sagemaker', region_name=region)

    try:
        job_info = sm_client.describe_training_job(TrainingJobName=job_name)
        model_s3_path = job_info['ModelArtifacts']['S3ModelArtifacts']

        print(f"Model artifacts: {model_s3_path}")

    except Exception as e:
        print(f"Error getting job info: {e}")
        print("Trying to find model in S3...")

        # Try to find model in S3
        s3_client = boto3.client('s3')

        # Look for output directory
        bucket = f"llama3-finetune-{job_name.split('-')[-1]}"
        prefix = f"outputs/{job_name}/"

        try:
            response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
            if 'Contents' in response:
                for obj in response['Contents']:
                    if obj['Key'].endswith('output/model.tar.gz'):
                        model_s3_path = f"s3://{bucket}/{obj['Key']}"
                        break
        except:
            model_s3_path = input("Enter full S3 path to model.tar.gz: ")

    # Create HuggingFace model
    print("\nCreating model object...")

    huggingface_model = HuggingFaceModel(
        model_data=model_s3_path,
        role='your-sagemaker-role-arn',  # Replace with your role
        transformers_version='4.36.0',
        pytorch_version='2.1.0',
        py_version='py310',
        env={
            'HF_MODEL_ID': 'mistralai/Mistral-7B-Instruct-v0.1',
            'SM_NUM_GPUS': '1',
            'MAX_INPUT_LENGTH': '512',
            'MAX_TOTAL_TOKENS': '1024',
        }
    )

    # Deploy to endpoint
    print("Deploying endpoint (this will take 5-10 minutes)...")

    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type='ml.g5.xlarge',  # Smaller than training instance
        endpoint_name=endpoint_name,
        wait=True
    )

    print(f"\n✅ Endpoint deployed successfully!")
    print(f"Endpoint name: {endpoint_name}")
    print(f"Instance type: ml.g5.xlarge")
    print(f"Endpoint ARN: {predictor.endpoint}")

    # Test the endpoint
    print("\nTesting endpoint...")

    test_prompt = {
        "inputs": "### Instruction:\nHow do I reset my password?\n\n### Response:",
        "parameters": {
            "max_new_tokens": 200,
            "temperature": 0.7,
            "top_p": 0.9,
            "do_sample": True
        }
    }

    try:
        response = predictor.predict(test_prompt)
        print("Test response:")
        print(json.dumps(response, indent=2)[:500] + "...")

    except Exception as e:
        print(f"Test failed: {e}")

    return predictor

def test_endpoint(endpoint_name):
    """Test an existing endpoint"""
    import boto3

    runtime = boto3.client('runtime.sagemaker')

    prompt = {
        "inputs": "### Instruction:\nWhat's your refund policy?\n\n### Response:",
        "parameters": {
            "max_new_tokens": 100,
            "temperature": 0.1  # Lower temperature for more focused responses
        }
    }

    response = runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=json.dumps(prompt)
    )

    result = json.loads(response['Body'].read().decode())
    print("Response from endpoint:")
    print(result[0]['generated_text'])

    return result

def cleanup(endpoint_name):
    """Delete endpoint to stop charges"""
    print(f"Deleting endpoint: {endpoint_name}")

    sm_client = boto3.client('sagemaker')

    try:
        sm_client.delete_endpoint(EndpointName=endpoint_name)
        print(f"Endpoint {endpoint_name} deleted")

        # Also delete endpoint config
        try:
            endpoint_info = sm_client.describe_endpoint(EndpointName=endpoint_name)
            config_name = endpoint_info['EndpointConfigName']
            sm_client.delete_endpoint_config(EndpointConfigName=config_name)
            print(f"Endpoint config {config_name} deleted")
        except:
            pass

    except Exception as e:
        print(f"Error deleting endpoint: {e}")

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Deploy fine-tuned model")
    parser.add_argument("--job-name", required=True, help="Training job name")
    parser.add_argument("--endpoint-name", help="Endpoint name (optional)")
    parser.add_argument("--test", action="store_true", help="Test existing endpoint")
    parser.add_argument("--cleanup", action="store_true", help="Delete endpoint")

    args = parser.parse_args()

    if args.cleanup and args.endpoint_name:
        cleanup(args.endpoint_name)

    elif args.test and args.endpoint_name:
        test_endpoint(args.endpoint_name)

    else:
        deploy_finetuned_model(args.job_name, args.endpoint_name)

Run deployment:

# Deploy the model
python deploy_model.py --job-name your-training-job-name

# Test the endpoint
python deploy_model.py --test --endpoint-name ft-your-job-name

# Clean up (important to avoid charges!)
python deploy_model.py --cleanup --endpoint-name ft-your-job-name

Phase 5: Production Considerations

Step 12: Create Production Setup Script

Create production_setup.py:

#!/usr/bin/env python3
# production_setup.py

import json
import os
from pathlib import Path

def create_ci_cd_pipeline():
    """Create CI/CD pipeline configuration"""

    pipeline_config = {
        "name": "llama-finetune-pipeline",
        "stages": [
            {
                "name": "DataValidation",
                "script": "scripts/validate_data.py",
                "instance": "ml.m5.large",
                "timeout": 1800
            },
            {
                "name": "Training",
                "script": "scripts/train.py",
                "instance": "ml.g5.2xlarge",
                "use_spot": True,
                "hyperparameters": {
                    "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
                    "num_epochs": 3,
                    "learning_rate": "2e-4"
                }
            },
            {
                "name": "Evaluation",
                "script": "scripts/evaluate.py",
                "instance": "ml.g5.xlarge",
                "metrics": ["accuracy", "perplexity", "bleu"]
            },
            {
                "name": "Deployment",
                "condition": "evaluation.accuracy > 0.85",
                "instance": "ml.g5.xlarge",
                "auto_scale": {
                    "min_capacity": 1,
                    "max_capacity": 5
                }
            }
        ],
        "monitoring": {
            "cloudwatch_metrics": [
                "Invocations",
                "ModelLatency",
                "CPUUtilization",
                "MemoryUtilization"
            ],
            "alarms": [
                {
                    "metric": "ModelLatency",
                    "threshold": 1000,  # ms
                    "periods": 2
                },
                {
                    "metric": "Invocations",
                    "threshold": 1000,  # per minute
                    "periods": 5
                }
            ]
        },
        "cost_tracking": {
            "daily_budget": 50,
            "alarm_threshold": 80,
            "report_frequency": "daily"
        }
    }

    # Save pipeline config
    with open('pipeline_config.json', 'w') as f:
        json.dump(pipeline_config, f, indent=2)

    print("✅ CI/CD pipeline configuration created")
    print("Next steps:")
    print("1. Review pipeline_config.json")
    print("2. Set up CodePipeline in AWS Console")
    print("3. Configure S3 triggers for automatic retraining")
    print("4. Set up CloudWatch alarms for monitoring")

    return pipeline_config

def create_monitoring_dashboard():
    """Create CloudWatch dashboard configuration"""

    dashboard = {
        "widgets": [
            {
                "type": "metric",
                "properties": {
                    "metrics": [
                        ["AWS/SageMaker", "Invocations", "EndpointName", "your-endpoint"],
                        ["AWS/SageMaker", "ModelLatency", "EndpointName", "your-endpoint"]
                    ],
                    "view": "timeSeries",
                    "stacked": False,
                    "region": "us-east-1",
                    "title": "Endpoint Performance"
                }
            },
            {
                "type": "metric",
                "properties": {
                    "metrics": [
                        ["AWS/SageMaker", "CPUUtilization", "EndpointName", "your-endpoint"],
                        ["AWS/SageMaker", "MemoryUtilization", "EndpointName", "your-endpoint"]
                    ],
                    "view": "gauge",
                    "region": "us-east-1",
                    "title": "Resource Utilization"
                }
            },
            {
                "type": "text",
                "properties": {
                    "markdown": "# Fine-Tuned Model Dashboard\n\n## Key Metrics\n- **Cost Today**: $12.45\n- **Total Invocations**: 12,345\n- **Avg Latency**: 245ms\n- **Error Rate**: 0.12%\n\n## Actions\n- [View Detailed Logs](https://console.aws.amazon.com/cloudwatch/home)\n- [Open SageMaker Console](https://console.aws.amazon.com/sagemaker/home)"
                }
            }
        ]
    }

    with open('dashboard_config.json', 'w') as f:
        json.dump(dashboard, f, indent=2)

    print("✅ Dashboard configuration created")

    return dashboard

def create_cost_estimator():
    """Create cost estimation tool"""

    estimator = {
        "instance_pricing": {
            "ml.g5.xlarge": {"on_demand": 1.212, "spot": 0.3636},
            "ml.g5.2xlarge": {"on_demand": 2.176, "spot": 0.6528},
            "ml.g5.4xlarge": {"on_demand": 4.352, "spot": 1.3056},
            "ml.g5.8xlarge": {"on_demand": 8.704, "spot": 2.6112},
            "ml.g5.12xlarge": {"on_demand": 13.056, "spot": 3.9168}
        },
        "training_estimator": {
            "small": {"instances": "ml.g5.2xlarge", "hours": 4, "cost": 8.70},
            "medium": {"instances": "ml.g5.4xlarge", "hours": 8, "cost": 34.82},
            "large": {"instances": "ml.g5.8xlarge", "hours": 16, "cost": 139.26}
        },
        "inference_estimator": {
            "low_traffic": {"instances": "ml.g5.xlarge", "hours": 24, "cost": 29.09},
            "medium_traffic": {"instances": "ml.g5.2xlarge", "hours": 24, "cost": 52.22},
            "high_traffic": {"instances": "ml.g5.4xlarge", "hours": 24, "cost": 104.45}
        }
    }

    with open('cost_estimator.json', 'w') as f:
        json.dump(estimator, f, indent=2)

    print("✅ Cost estimator created")

    # Create simple Python calculator
    calculator_code = '''
def estimate_training_cost(instance_type, hours, use_spot=True):
    """Estimate training cost"""
    pricing = {
        "ml.g5.xlarge": 1.212,
        "ml.g5.2xlarge": 2.176,
        "ml.g5.4xlarge": 4.352,
        "ml.g5.8xlarge": 8.704,
    }

    hourly = pricing.get(instance_type, 2.0)
    if use_spot:
        hourly *= 0.3  # 70% discount

    return hourly * hours

def estimate_monthly_inference(instance_type, requests_per_day, avg_latency_ms=200):
    """Estimate monthly inference cost"""
    pricing = {
        "ml.g5.xlarge": 1.212,
        "ml.g5.2xlarge": 2.176,
    }

    # Calculate instance hours needed
    total_processing_seconds = requests_per_day * (avg_latency_ms / 1000)
    instance_hours = total_processing_seconds / 3600

    # Add 20% buffer
    instance_hours *= 1.2

    hourly = pricing.get(instance_type, 1.5)
    daily_cost = hourly * instance_hours
    monthly_cost = daily_cost * 30

    return {
        "daily_cost": round(daily_cost, 2),
        "monthly_cost": round(monthly_cost, 2),
        "instance_hours_per_day": round(instance_hours, 2)
    }
'''

    with open('cost_calculator.py', 'w') as f:
        f.write(calculator_code)

    return estimator

if __name__ == "__main__":
    print("Setting up production configuration...")
    print("=" * 60)

    # Create all configurations
    pipeline = create_ci_cd_pipeline()
    dashboard = create_monitoring_dashboard()
    cost_config = create_cost_estimator()

    print("\n" + "=" * 60)
    print("PRODUCTION SETUP COMPLETE")
    print("=" * 60)
    print("\nCreated files:")
    print("1. pipeline_config.json - CI/CD pipeline configuration")
    print("2. dashboard_config.json - CloudWatch dashboard")
    print("3. cost_estimator.json - Cost estimation data")
    print("4. cost_calculator.py - Python cost calculator")

    print("\nNext steps for production:")
    print("1. Set up AWS Budgets with alerts")
    print("2. Configure VPC for private endpoint access")
    print("3. Set up logging to S3 for compliance")
    print("4. Implement A/B testing for model versions")
    print("5. Create automated retraining pipeline")

Troubleshooting Common Issues

Issue 1: "No space left on device"

# Add to training script:
training_args = TrainingArguments(
    gradient_checkpointing=True,  # Reduces memory
    gradient_accumulation_steps=4,  # Simulates larger batch
    fp16=False,  # Use bf16 instead
    bf16=True,
)

Issue 2: Training too slow

python

# Switch to a faster instance
# ml.g5.2xlarge → ml.g5.4xlarge (2x faster, 2x cost)
# Use gradient accumulation instead of larger batch size

Issue 3: Model not learning

python

# Check your data format
# Lower learning rate: 2e-4 → 1e-4
# Increase epochs: 3 → 5
# Add more diverse training examples

Quick Start - One Command Setup

Create setup.sh:

#!/bin/bash
# setup.sh - Complete setup script

echo "🚀 Starting Llama 3 Fine-Tuning Setup..."
echo "=========================================="

# Step 1: Setup environment
echo "1. Setting up Python environment..."
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Step 2: Prepare data
echo "2. Preparing sample data..."
python data/prepare_data.py

# Step 3: Setup AWS (interactive)
echo "3. Setting up AWS..."
read -p "Enter your SageMaker Role ARN: " ROLE_ARN
read -p "Enter S3 bucket name: " BUCKET_NAME

# Step 4: Upload to S3
echo "4. Uploading to S3..."
aws s3 mb s3://$BUCKET_NAME
aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/
aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/

# Step 5: Launch training
echo "5. Launching training job..."
python launch_training.py

echo "✅ Setup complete!"
echo "Training job launched. Check AWS Console for progress."

Make it executable and run:

bash

chmod +x setup.sh
./setup.sh

Summary: Your Complete Path

Hour 0-1: Setup AWS, install dependencies, prepare data
Hour 1-2: Configure SageMaker, upload data to S3
Hour 2-3: Launch training job (runs for 2-4 hours)
Hour 6-7: Check results, download model
Hour 7-8: Deploy endpoint, test inference
Hour 8+: Set up monitoring, CI/CD, production features

Total hands-on time: 2-3 hours
Total wait time: 2-4 hours (training) + 10-15 minutes (deployment)
Total cost: $10-50 depending on configuration

Need help? Common issues and solutions:

Permission errors: Make sure your IAM role has SageMakerFullAccess
Out of memory: Reduce batch size, enable gradient checkpointing
Training too slow: Use larger instance or spot instances
Model not loading: Check Hugging Face token for Llama 3 access

This is the complete, end-to-end guide with every single step. Copy and run each command in order, and you'll have a fine-tuned model running in production.

How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

Phase 1: Setup & Preparation (30-45 minutes)

Step 1: AWS Account & Permissions Setup

Step 2: Request Model Access

Step 3: Prepare Your Local Environment

Step 4: Prepare Training Data

Phase 2: SageMaker Setup (20 minutes)

Step 5: Create S3 Bucket for Data & Models

Step 6: Create SageMaker Training Script

Step 7: Create SageMaker Entry Point Script

Phase 3: Launch Training (15 minutes)

Step 8: Create Launch Script

Step 9: Run the Training!

Phase 4: Monitor & Deploy (After Training Completes)

Step 10: Check Training Results

Step 11: Deploy the Model

Phase 5: Production Considerations

Step 12: Create Production Setup Script

Troubleshooting Common Issues

Issue 1: "No space left on device"

Issue 2: Training too slow

Issue 3: Model not learning

Quick Start - One Command Setup

Summary: Your Complete Path

Comments

AWS Cloud Content

Six Methods to Integration Can Improve Your Cloud Services

More from this blog

Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases

The DevOps Roadmap: A Guide to Becoming a DevOps Engineer Professional

The AWS Well-Architected Framework: 6 pillars of successful architectures.

Choose between Amazon RDS and AWS EC2.

Command Palette

Phase 1: Setup & Preparation (30-45 minutes)

Step 1: AWS Account & Permissions Setup

Step 2: Request Model Access

Step 3: Prepare Your Local Environment

Step 4: Prepare Training Data

Phase 2: SageMaker Setup (20 minutes)

Step 5: Create S3 Bucket for Data & Models

Step 6: Create SageMaker Training Script

Step 7: Create SageMaker Entry Point Script

Phase 3: Launch Training (15 minutes)

Step 8: Create Launch Script

Step 9: Run the Training!

Phase 4: Monitor & Deploy (After Training Completes)

Step 10: Check Training Results

Step 11: Deploy the Model

Phase 5: Production Considerations

Step 12: Create Production Setup Script

Troubleshooting Common Issues

Issue 1: "No space left on device"

Issue 2: Training too slow

Issue 3: Model not learning

Quick Start - One Command Setup

Summary: Your Complete Path

Comments

AWS Cloud Content

Six Methods to Integration Can Improve Your Cloud Services

More from this blog