Skip to main content

Command Palette

Search for a command to run...

How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

Published
20 min read
How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide
O

Software Engineer / AWS Cloud Engineer/ DevOps Engineer/ Team Lead

Phase 1: Setup & Preparation (30-45 minutes)

Step 1: AWS Account & Permissions Setup

1.1 Login to AWS Console

  • Go to https://aws.amazon.com and sign in

  • If new, create account (has free tier but will need payment method)

  • 1.2 Create IAM User for SageMaker (Don't use root!)

    1. Go to IAM Service

    2. Click "Users" → "Create user"

    3. Username: sagemaker-user

    4. Select "Attach policies directly"

    5. Add these policies:

      • AmazonSageMakerFullAccess

      • AmazonS3FullAccess

      • AWSCloudFormationFullAccess

      • IAMFullAccess (temporarily, for setup)

    6. Click "Create user"

    7. Go to "Security credentials" tab

    8. Click "Create access key"

    9. Select "Command Line Interface (CLI)"

    10. Copy the Access Key ID and Secret Access Key

      1.3 Configure AWS CLI on Your Machine

      # Install AWS CLI (if not installed)
      # For Mac:
      brew install awscli
      # For Ubuntu:
      sudo apt-get install awscli
      # For Windows (PowerShell):
      winget install -e --id Amazon.AWSCLI
      
      # Configure AWS CLI
      aws configure
      # Enter:
      # AWS Access Key ID: [paste from step above]
      # AWS Secret Access Key: [paste from step above]
      # Default region: us-east-1 (or your preferred region)
      # Default output format: json
      

      1.4 Configure AWS CLI on Your Machine

      Step 2: Request Model Access

      2.1 Get Llama 3 Access on Hugging Face

      # 1. Go to https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
      # 2. Click "Request Access"
      # 3. Fill the form (use your real details)
      # 4. Wait for approval (usually within hours)
      
      # Alternative: Use a different open model that doesn't require approval
      # We'll use "mistralai/Mistral-7B-Instruct-v0.1" for this tutorial
      # No approval needed!
      

    11. 2.2 Create Hugging Face Token (For Llama 3 if approved)

      1. Go to https://huggingface.co
      2. Sign up/login
      3. Click profile → Settings → Access Tokens
      4. Click "New token"
      5. Name: aws-sagemaker
      6. Role: Write (for uploading models if needed)
      7. Copy the token
      

      Step 3: Prepare Your Local Environment

      3.1 Create Project Directory Structure

      bash

      mkdir llama3-finetune-tutorial
      cd llama3-finetune-tutorial
      
      # Create the directory structure
      mkdir -p scripts data configs outputs
      mkdir -p docker train deploy monitor
      

      3.2 Create Virtual Environment & Install Dependencies

      # Create virtual environment
      python -m venv venv
      
      # Activate it
      # On Mac/Linux:
      source venv/bin/activate
      
      # On Windows:
      # venv\Scripts\activate
      
      # Install required packages
      pip install --upgrade pip
      pip install boto3 sagemaker awscli
      pip install transformers==4.36.0
      pip install datasets==2.14.0
      pip install peft==0.7.0
      pip install accelerate==0.25.0
      pip install bitsandbytes==0.41.3
      pip install torch==2.1.0
      pip install scikit-learn
      

      Step 4: Prepare Training Data

      4.1 Create Sample Training Data

      # Create data directory
      cd data
      

      Create prepare_data.py:

      # data/prepare_data.py
      import json
      import pandas as pd
      
      def create_sample_data():
          """Create example training data for customer service chatbot"""
      
          examples = [
              {
                  "instruction": "How do I reset my account password?",
                  "input": "",
                  "output": "To reset your password, visit our login page and click 'Forgot Password'. Enter your email address and check your inbox for a reset link. The link expires in 24 hours. If you don't receive the email, check your spam folder or contact support@ourcompany.com."
              },
              {
                  "instruction": "What's your refund policy?",
                  "input": "Customer purchased item 2 weeks ago",
                  "output": "We offer a 30-day money-back guarantee for all products. Since your purchase was 2 weeks ago, you're eligible for a full refund. Please go to 'My Orders' in your account, select the item, and click 'Request Refund'. Refunds are processed within 5-7 business days."
              },
              {
                  "instruction": "Do you offer international shipping?",
                  "input": "",
                  "output": "Yes, we ship to over 50 countries. International shipping takes 7-14 business days and costs $25. Additional customs fees may apply depending on your country's regulations. You can see exact shipping costs at checkout."
              },
              {
                  "instruction": "How do I contact customer support?",
                  "input": "",
                  "output": "You can contact our support team in three ways: 1) Email: support@ourcompany.com (response within 24 hours), 2) Live Chat: Available Mon-Fri 9AM-6PM EST on our website, 3) Phone: 1-800-555-HELP (Mon-Fri 8AM-8PM EST)."
              },
              {
                  "instruction": "What payment methods do you accept?",
                  "input": "Customer is from Europe",
                  "output": "We accept Visa, MasterCard, American Express, PayPal, and Apple Pay. For European customers, we also support SEPA transfers and iDEAL. All payments are processed securely through PCI-compliant systems."
              }
          ]
      
          # Create 100 synthetic examples (for demo - in real life, use real data)
          all_examples = []
          base_scenarios = examples.copy()
      
          for i in range(100):
              base = base_scenarios[i % len(base_scenarios)]
              new_example = base.copy()
      
              # Add some variation
              if "password" in new_example["instruction"].lower():
                  variations = [
                      "I forgot my password",
                      "Can't login to my account",
                      "Need to change my password"
                  ]
                  new_example["instruction"] = variations[i % len(variations)]
      
              # Format for training
              text = f"### Instruction:\n{new_example['instruction']}\n\n"
              if new_example['input']:
                  text += f"### Input:\n{new_example['input']}\n\n"
              text += f"### Response:\n{new_example['output']}"
      
              all_examples.append({"text": text})
      
          # Save to JSON
          with open('train.json', 'w') as f:
              json.dump(all_examples, f, indent=2)
      
          # Also save in instruction format
          instruction_examples = []
          for ex in all_examples:
              lines = ex['text'].split('\n')
              instruction = lines[0].replace('### Instruction:', '').strip()
              response = lines[-1].replace('### Response:', '').strip()
              instruction_examples.append({
                  "instruction": instruction,
                  "response": response
              })
      
          with open('instructions.json', 'w') as f:
              json.dump(instruction_examples, f, indent=2)
      
          print(f"Created {len(all_examples)} training examples")
          print(f"Sample: {all_examples[0]['text'][:200]}...")
      
          return all_examples
      
      if __name__ == "__main__":
          create_sample_data()
      

      Run it:

      python prepare_data.py
      

      4.2 Create Validation Data
      Create validation.json:

      [
        {
          "text": "### Instruction:\nHow do I track my order?\n\n### Response:\nYou can track your order by logging into your account and going to 'Order History'. Click on the order number to see tracking details. You'll receive tracking emails at every major shipment milestone. For urgent inquiries, contact support@ourcompany.com."
        },
        {
          "text": "### Instruction:\nDo you have a mobile app?\n\n### Input:\nCustomer uses iPhone\n\n### Response:\nYes, we have both iOS and Android apps. You can download our iOS app from the App Store by searching 'OurCompany'. The app includes all website features plus push notifications for order updates and exclusive mobile-only deals."
        }
      ]
      

      Phase 2: SageMaker Setup (20 minutes)

      Step 5: Create S3 Bucket for Data & Models

      5.1 Create Bucket

      # Create unique bucket name (must be globally unique)
      BUCKET_NAME="llama3-finetune-$(date +%s)-$RANDOM"
      echo "Bucket name: $BUCKET_NAME"
      
      # Create bucket
      aws s3 mb s3://$BUCKET_NAME
      
      # Create folder structure
      aws s3api put-object --bucket $BUCKET_NAME --key data/train/
      aws s3api put-object --bucket $BUCKET_NAME --key data/validation/
      aws s3api put-object --bucket $BUCKET_NAME --key models/
      aws s3api put-object --bucket $BUCKET_NAME --key outputs/
      

      5.2 Upload Data to S3

      # Upload training data
      aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/train.json
      aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/validation.json
      
      # Verify upload
      aws s3 ls s3://$BUCKET_NAME/data/train/
      aws s3 ls s3://$BUCKET_NAME/data/validation/
      

      Step 6: Create SageMaker Training Script

      Create scripts/train.py:

      #!/usr/bin/env python3
      # scripts/train.py
      
      import os
      import sys
      import json
      import torch
      import logging
      from pathlib import Path
      
      # Add project root to path
      sys.path.append(str(Path(__file__).parent.parent))
      
      from transformers import (
          AutoModelForCausalLM,
          AutoTokenizer,
          Trainer,
          TrainingArguments,
          DataCollatorForLanguageModeling,
          BitsAndBytesConfig
      )
      from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
      from datasets import load_dataset, Dataset
      import numpy as np
      
      # Set up logging
      logging.basicConfig(level=logging.INFO)
      logger = logging.getLogger(__name__)
      
      class LLMTrainer:
          def __init__(self, config_path="configs/training_config.json"):
              """Initialize trainer with configuration"""
              with open(config_path, 'r') as f:
                  self.config = json.load(f)
      
              logger.info(f"Configuration loaded: {self.config}")
      
              # Set device
              self.device = "cuda" if torch.cuda.is_available() else "cpu"
              logger.info(f"Using device: {self.device}")
      
          def load_model_and_tokenizer(self):
              """Load base model and tokenizer"""
              logger.info(f"Loading model: {self.config['model_name']}")
      
              # Configure 4-bit quantization to save memory
              bnb_config = BitsAndBytesConfig(
                  load_in_4bit=True,
                  bnb_4bit_quant_type="nf4",
                  bnb_4bit_compute_dtype=torch.bfloat16,
                  bnb_4bit_use_double_quant=True
              )
      
              # Load model with quantization
              self.model = AutoModelForCausalLM.from_pretrained(
                  self.config["model_name"],
                  quantization_config=bnb_config,
                  device_map="auto",
                  trust_remote_code=True,
                  use_auth_token=True if "llama" in self.config["model_name"].lower() else False
              )
      
              # Load tokenizer
              self.tokenizer = AutoTokenizer.from_pretrained(
                  self.config["model_name"],
                  trust_remote_code=True,
                  use_auth_token=True if "llama" in self.config["model_name"].lower() else False
              )
      
              # Set padding token
              if self.tokenizer.pad_token is None:
                  self.tokenizer.pad_token = self.tokenizer.eos_token
      
              logger.info(f"Model loaded: {self.config['model_name']}")
              logger.info(f"Tokenizer vocab size: {len(self.tokenizer)}")
      
          def prepare_model_for_training(self):
              """Apply LoRA configuration to model"""
              logger.info("Preparing model for LoRA training...")
      
              # Prepare model for k-bit training
              self.model = prepare_model_for_kbit_training(self.model)
      
              # Configure LoRA
              lora_config = LoraConfig(
                  r=self.config["lora_r"],
                  lora_alpha=self.config["lora_alpha"],
                  target_modules=self.config["lora_target_modules"],
                  lora_dropout=self.config["lora_dropout"],
                  bias="none",
                  task_type="CAUSAL_LM"
              )
      
              # Apply LoRA
              self.model = get_peft_model(self.model, lora_config)
      
              # Print trainable parameters
              self.model.print_trainable_parameters()
      
          def load_and_tokenize_data(self):
              """Load and tokenize training data"""
              logger.info("Loading training data...")
      
              # Get data paths from environment (SageMaker sets these)
              train_data_path = os.environ.get('SM_CHANNEL_TRAIN', 'data/train')
              val_data_path = os.environ.get('SM_CHANNEL_VALIDATION', 'data/validation')
      
              logger.info(f"Train data path: {train_data_path}")
              logger.info(f"Validation data path: {val_data_path}")
      
              # Load datasets
              train_files = [str(f) for f in Path(train_data_path).glob("*.json")]
              val_files = [str(f) for f in Path(val_data_path).glob("*.json")]
      
              train_dataset = load_dataset('json', data_files=train_files)
              val_dataset = load_dataset('json', data_files=val_files) if val_files else None
      
              # Tokenization function
              def tokenize_function(examples):
                  return self.tokenizer(
                      examples["text"],
                      truncation=True,
                      padding="max_length",
                      max_length=self.config["max_length"]
                  )
      
              # Tokenize datasets
              tokenized_train = train_dataset.map(
                  tokenize_function,
                  batched=True,
                  remove_columns=train_dataset["train"].column_names
              )
      
              if val_dataset:
                  tokenized_val = val_dataset.map(
                      tokenize_function,
                      batched=True,
                      remove_columns=val_dataset["train"].column_names
                  )
              else:
                  tokenized_val = None
      
              logger.info(f"Training samples: {len(tokenized_train['train'])}")
              if tokenized_val:
                  logger.info(f"Validation samples: {len(tokenized_val['train'])}")
      
              return tokenized_train["train"], tokenized_val["train"] if tokenized_val else None
      
          def train(self):
              """Main training loop"""
              logger.info("Starting training process...")
      
              # Load model and tokenizer
              self.load_model_and_tokenizer()
      
              # Prepare for LoRA training
              self.prepare_model_for_training()
      
              # Load and tokenize data
              train_dataset, val_dataset = self.load_and_tokenize_data()
      
              # Create data collator
              data_collator = DataCollatorForLanguageModeling(
                  tokenizer=self.tokenizer,
                  mlm=False
              )
      
              # Set output directory
              output_dir = "/opt/ml/model"  # SageMaker expects this
      
              # Configure training arguments
              training_args = TrainingArguments(
                  output_dir=output_dir,
                  num_train_epochs=self.config["num_epochs"],
                  per_device_train_batch_size=self.config["batch_size"],
                  per_device_eval_batch_size=self.config["batch_size"],
                  gradient_accumulation_steps=self.config["gradient_accumulation_steps"],
                  warmup_steps=self.config["warmup_steps"],
                  logging_steps=self.config["logging_steps"],
                  save_steps=self.config["save_steps"],
                  eval_steps=self.config["eval_steps"] if val_dataset else None,
                  evaluation_strategy="steps" if val_dataset else "no",
                  save_strategy="steps",
                  save_total_limit=2,
                  load_best_model_at_end=True if val_dataset else False,
                  metric_for_best_model="eval_loss" if val_dataset else None,
                  greater_is_better=False if val_dataset else None,
                  learning_rate=self.config["learning_rate"],
                  weight_decay=self.config["weight_decay"],
                  fp16=False,
                  bf16=self.config.get("bf16", False),
                  gradient_checkpointing=self.config["gradient_checkpointing"],
                  optim=self.config["optimizer"],
                  report_to=["tensorboard"],
                  ddp_find_unused_parameters=False,
                  remove_unused_columns=False
              )
      
              # Initialize Trainer
              trainer = Trainer(
                  model=self.model,
                  args=training_args,
                  train_dataset=train_dataset,
                  eval_dataset=val_dataset,
                  data_collator=data_collator,
              )
      
              # Start training
              logger.info("Training started...")
              train_result = trainer.train()
      
              # Save model
              trainer.save_model()
              self.tokenizer.save_pretrained(output_dir)
      
              # Save training metrics
              metrics = train_result.metrics
              trainer.log_metrics("train", metrics)
              trainer.save_metrics("train", metrics)
      
              if val_dataset:
                  eval_metrics = trainer.evaluate()
                  trainer.log_metrics("eval", eval_metrics)
                  trainer.save_metrics("eval", eval_metrics)
      
              logger.info(f"Training completed! Model saved to {output_dir}")
      
              return metrics
      
      def main():
          """Main entry point"""
          try:
              # Check if running in SageMaker
              sm_training_env = os.environ.get('SM_TRAINING_ENV', '')
              if sm_training_env:
                  logger.info(f"Running in SageMaker environment: {sm_training_env}")
      
              # Initialize and run trainer
              trainer = LLMTrainer()
              metrics = trainer.train()
      
              logger.info("Training completed successfully!")
              logger.info(f"Final metrics: {metrics}")
      
          except Exception as e:
              logger.error(f"Training failed with error: {str(e)}")
              raise
      
      if __name__ == "__main__":
          main()
      

      Create configs/training_config.json:

      {
        "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
        "num_epochs": 3,
        "batch_size": 2,
        "gradient_accumulation_steps": 4,
        "learning_rate": 2e-4,
        "weight_decay": 0.01,
        "warmup_steps": 100,
        "logging_steps": 50,
        "save_steps": 100,
        "eval_steps": 100,
        "max_length": 512,
        "lora_r": 16,
        "lora_alpha": 32,
        "lora_dropout": 0.1,
        "lora_target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
        "gradient_checkpointing": true,
        "bf16": true,
        "optimizer": "adamw_8bit"
      }
      

      Create scripts/requirements.txt:

      transformers==4.36.0
      datasets==2.14.0
      accelerate==0.25.0
      peft==0.7.0
      bitsandbytes==0.41.3
      torch==2.1.0
      scikit-learn
      sentencepiece
      protobuf
      einops
      

      Step 7: Create SageMaker Entry Point Script

      Create scripts/sagemaker_entry.py:

      #!/usr/bin/env python3
      # scripts/sagemaker_entry.py
      
      import os
      import sys
      import subprocess
      import argparse
      
      def install_requirements():
          """Install required packages"""
          print("Installing requirements...")
          subprocess.check_call([
              sys.executable, "-m", "pip", "install",
              "-r", "/opt/ml/code/requirements.txt"
          ])
      
      def main():
          parser = argparse.ArgumentParser()
          parser.add_argument(
              "--train", 
              action="store_true",
              help="Run training"
          )
          parser.add_argument(
              "--serve", 
              action="store_true",
              help="Run serving"
          )
      
          args = parser.parse_args()
      
          if args.train:
              # Install dependencies first
              install_requirements()
      
              # Run training
              print("Starting training...")
              from train import main as train_main
              train_main()
      
          elif args.serve:
              print("Serving mode - this would load the model for inference")
              # For SageMaker deployment
              pass
      
      if __name__ == "__main__":
          main()
      

      Phase 3: Launch Training (15 minutes)

      Step 8: Create Launch Script

      Create launch_training.py:

      #!/usr/bin/env python3
      # launch_training.py
      
      import os
      import sys
      import json
      import boto3
      import time
      from datetime import datetime
      from sagemaker.huggingface import HuggingFace, get_huggingface_llm_image_uri
      
      def create_training_job():
          """Create and launch SageMaker training job"""
      
          # Configuration
          config = {
              "job_name": f"llama-finetune-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
              "instance_type": "ml.g5.2xlarge",  # Cheapest GPU with enough memory
              "instance_count": 1,
              "volume_size": 200,  # GB
              "max_run_hours": 4,
              "use_spot_instances": True,
              "max_wait_hours": 8,
              "bucket_name": "llama3-finetune-1234567890",  # Your bucket from earlier
              "role_arn": None,  # Will get from SageMaker
          }
      
          # Initialize session
          session = boto3.Session()
          sagemaker_session = boto3.Session().client('sagemaker')
      
          # Get SageMaker execution role
          if not config["role_arn"]:
              # Try to get default role
              try:
                  iam = boto3.client('iam')
                  roles = iam.list_roles(PathPrefix='/service-role/')
                  for role in roles['Roles']:
                      if 'AmazonSageMaker-ExecutionRole' in role['RoleName']:
                          config["role_arn"] = role['Arn']
                          break
              except:
                  pass
      
              if not config["role_arn"]:
                  print("No SageMaker role found. Creating one...")
                  # You'll need to create this through AWS Console first
                  print("Please create a SageMaker execution role:")
                  print("1. Go to IAM Console")
                  print("2. Create role")
                  print("3. Select 'SageMaker' as use case")
                  print("4. Attach policies: AmazonSageMakerFullAccess, AmazonS3FullAccess")
                  print("5. Name: AmazonSageMaker-ExecutionRole")
                  print("6. Copy the ARN and paste it below")
                  config["role_arn"] = input("Enter SageMaker Execution Role ARN: ")
      
          # Create HuggingFace estimator
          print(f"Creating training job: {config['job_name']}")
      
          # Hyperparameters
          hyperparameters = {
              "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
              "num_epochs": "3",
              "batch_size": "2",
              "learning_rate": "2e-4",
              "lora_r": "16",
          }
      
          # Environment variables
          environment = {
              "HF_TOKEN": os.environ.get("HF_TOKEN", ""),  # For Llama 3 access
              "MODEL_CACHE": "/opt/ml/model",
          }
      
          # Create estimator
          estimator = HuggingFace(
              entry_point="sagemaker_entry.py",
              source_dir="scripts",
              instance_type=config["instance_type"],
              instance_count=config["instance_count"],
              volume_size=config["volume_size"],
              role=config["role_arn"],
              transformers_version="4.36.0",
              pytorch_version="2.1.0",
              py_version="py310",
              hyperparameters=hyperparameters,
              environment=environment,
              max_run=config["max_run_hours"] * 3600,
              use_spot_instances=config["use_spot_instances"],
              max_wait=config["max_wait_hours"] * 3600 if config["use_spot_instances"] else None,
              output_path=f"s3://{config['bucket_name']}/outputs/",
              code_location=f"s3://{config['bucket_name']}/code/",
              disable_profiler=True,
              debugger_hook_config=False,
          )
      
          # Define input data configuration
          inputs = {
              "train": f"s3://{config['bucket_name']}/data/train/",
              "validation": f"s3://{config['bucket_name']}/data/validation/",
          }
      
          # Launch training job
          print("Launching training job...")
          estimator.fit(inputs, job_name=config["job_name"], wait=False)
      
          # Get job details
          job_description = sagemaker_session.describe_training_job(
              TrainingJobName=config["job_name"]
          )
      
          print(f"\n✅ Training job launched successfully!")
          print(f"Job Name: {config['job_name']}")
          print(f"Job ARN: {job_description['TrainingJobArn']}")
          print(f"Instance: {config['instance_type']}")
          print(f"Spot Instances: {config['use_spot_instances']}")
          print(f"Estimated cost: ${estimate_cost(config['instance_type'], config['max_run_hours'])}")
          print(f"\nMonitor job at: https://{session.region_name}.console.aws.amazon.com/sagemaker/home?region={session.region_name}#/training-jobs/{config['job_name']}")
      
          return config["job_name"]
      
      def estimate_cost(instance_type, hours):
          """Rough cost estimation"""
          pricing = {
              "ml.g5.2xlarge": 1.212,  # per hour
              "ml.g5.4xlarge": 2.176,
              "ml.g5.8xlarge": 4.352,
              "ml.g5.12xlarge": 6.528,
          }
      
          base_cost = pricing.get(instance_type, 1.5) * hours
          spot_cost = base_cost * 0.3  # ~70% discount for spot
      
          return round(spot_cost, 2)
      
      def monitor_job(job_name):
          """Monitor training job progress"""
          client = boto3.client('sagemaker')
      
          print(f"\nMonitoring job: {job_name}")
          print("=" * 50)
      
          status = "InProgress"
          while status in ["InProgress", "Starting"]:
              try:
                  response = client.describe_training_job(TrainingJobName=job_name)
                  status = response['TrainingJobStatus']
      
                  if 'TrainingStartTime' in response:
                      elapsed = (time.time() - response['TrainingStartTime'].timestamp()) / 60
                      print(f"Status: {status} | Elapsed: {elapsed:.1f} min", end='\r')
      
                  if 'FinalMetricDataList' in response:
                      for metric in response['FinalMetricDataList']:
                          print(f"{metric['MetricName']}: {metric['Value']}")
      
                  time.sleep(30)
      
              except Exception as e:
                  print(f"\nError monitoring: {e}")
                  break
      
          print(f"\nFinal Status: {status}")
      
          if status == "Completed":
              print("✅ Training completed successfully!")
              print(f"Model artifacts: {response.get('ModelArtifacts', {}).get('S3ModelArtifacts', 'N/A')}")
          elif status == "Failed":
              print("❌ Training failed!")
              print(f"Failure reason: {response.get('FailureReason', 'Unknown')}")
      
          return status
      
      def main():
          """Main function"""
          print("=" * 60)
          print("Llama 3 Fine-Tuning on SageMaker - Launch Script")
          print("=" * 60)
      
          # Step 1: Create training job
          job_name = create_training_job()
      
          # Step 2: Ask if user wants to monitor
          monitor = input("\nDo you want to monitor the job? (yes/no): ").lower()
          if monitor in ['yes', 'y']:
              monitor_job(job_name)
      
          # Step 3: Show next steps
          print("\n" + "=" * 60)
          print("NEXT STEPS:")
          print("=" * 60)
          print("1. Wait for training to complete (2-4 hours)")
          print("2. Check S3 for model artifacts:")
          print(f"   aws s3 ls s3://llama3-finetune-*/outputs/{job_name}/")
          print("3. Deploy the model:")
          print("   python deploy_model.py --job-name " + job_name)
          print("\nTo check status manually:")
          print(f"   aws sagemaker describe-training-job --training-job-name {job_name}")
      
      if __name__ == "__main__":
          main()
      

      Step 9: Run the Training!

      # Make scripts executable
      chmod +x launch_training.py
      chmod +x scripts/*.py
      
      # Run the launch script
      python launch_training.py
      
      # Or run directly with minimal setup
      python -c "
      import boto3
      from sagemaker.huggingface import HuggingFace
      
      # Quick start - minimal configuration
      estimator = HuggingFace(
          entry_point='train.py',
          source_dir='scripts',
          instance_type='ml.g5.2xlarge',
          instance_count=1,
          role='your-sagemaker-role-arn',  # Replace with your role
          transformers_version='4.36',
          pytorch_version='2.1',
          py_version='py310',
          hyperparameters={
              'model_name': 'mistralai/Mistral-7B-Instruct-v0.1',
              'num_epochs': 1,  # Start with 1 epoch for testing
          }
      )
      
      # Start training
      estimator.fit({
          'train': 's3://your-bucket/data/train/',
          'validation': 's3://your-bucket/data/validation/'
      }, wait=True)
      "
      

      Phase 4: Monitor & Deploy (After Training Completes)

      Step 10: Check Training Results

      Create check_results.py:

      #!/usr/bin/env python3
      # check_results.py
      
      import boto3
      import json
      from datetime import datetime
      
      def check_training_job(job_name):
          """Check training job status and results"""
          client = boto3.client('sagemaker')
      
          try:
              response = client.describe_training_job(TrainingJobName=job_name)
      
              print(f"Job Name: {response['TrainingJobName']}")
              print(f"Status: {response['TrainingJobStatus']}")
              print(f"Creation Time: {response['CreationTime']}")
      
              if 'TrainingEndTime' in response:
                  print(f"End Time: {response['TrainingEndTime']}")
                  duration = (response['TrainingEndTime'] - response['TrainingStartTime']).total_seconds() / 3600
                  print(f"Duration: {duration:.2f} hours")
      
              if 'ModelArtifacts' in response:
                  print(f"\nModel Artifacts: {response['ModelArtifacts']['S3ModelArtifacts']}")
      
              if 'FinalMetricDataList' in response:
                  print("\nFinal Metrics:")
                  for metric in response['FinalMetricDataList']:
                      print(f"  {metric['MetricName']}: {metric['Value']:.4f}")
      
              # Check for Spot training savings
              if response.get('EnableManagedSpotTraining', False):
                  billable_time = response.get('BillableTimeInSeconds', 0)
                  total_time = response.get('TrainingTimeInSeconds', 0)
                  if total_time > 0:
                      savings = (1 - (billable_time / total_time)) * 100
                      print(f"\nSpot Training Savings: {savings:.1f}%")
                      print(f"Billable time: {billable_time/3600:.1f}h")
                      print(f"Total time: {total_time/3600:.1f}h")
      
              # Estimate cost
              instance_type = response['ResourceConfig']['InstanceType']
              duration_hours = response.get('TrainingTimeInSeconds', 0) / 3600
      
              # Rough pricing (varies by region)
              pricing = {
                  'ml.g5.2xlarge': 1.212,
                  'ml.g5.4xlarge': 2.176,
                  'ml.g5.8xlarge': 4.352,
              }
      
              hourly_rate = pricing.get(instance_type, 1.5)
              cost = hourly_rate * duration_hours
      
              if response.get('EnableManagedSpotTraining', False):
                  cost *= 0.3  # ~70% discount
      
              print(f"\nEstimated Cost: ${cost:.2f}")
      
              return response
      
          except Exception as e:
              print(f"Error: {e}")
              return None
      
      def download_model(job_name, local_dir="model_output"):
          """Download trained model from S3"""
          import os
          from urllib.parse import urlparse
          import tarfile
      
          # Get model artifacts location
          client = boto3.client('sagemaker')
          response = client.describe_training_job(TrainingJobName=job_name)
      
          if 'ModelArtifacts' not in response:
              print("No model artifacts found")
              return None
      
          s3_path = response['ModelArtifacts']['S3ModelArtifacts']
      
          # Parse S3 URL
          parsed = urlparse(s3_path)
          bucket = parsed.netloc
          key = parsed.path.lstrip('/')
      
          # Create local directory
          os.makedirs(local_dir, exist_ok=True)
      
          # Download file
          local_file = os.path.join(local_dir, 'model.tar.gz')
      
          print(f"Downloading model from s3://{bucket}/{key}")
          print(f"To: {local_file}")
      
          s3 = boto3.client('s3')
          s3.download_file(bucket, key, local_file)
      
          # Extract if it's a tar file
          if local_file.endswith('.tar.gz'):
              print("Extracting model...")
              with tarfile.open(local_file, 'r:gz') as tar:
                  tar.extractall(path=local_dir)
      
              # Remove tar file
              os.remove(local_file)
      
          print(f"Model downloaded to: {local_dir}")
      
          # List contents
          print("\nModel contents:")
          for root, dirs, files in os.walk(local_dir):
              for file in files[:10]:  # Show first 10 files
                  print(f"  {os.path.join(root, file)}")
      
          return local_dir
      
      if __name__ == "__main__":
          import sys
      
          if len(sys.argv) > 1:
              job_name = sys.argv[1]
          else:
              job_name = input("Enter training job name: ")
      
          print(f"Checking job: {job_name}")
          print("=" * 60)
      
          result = check_training_job(job_name)
      
          if result and result['TrainingJobStatus'] == 'Completed':
              download = input("\nDownload model? (yes/no): ").lower()
              if download in ['yes', 'y']:
                  download_model(job_name)
      

      Run it:

      # After training completes
      python check_results.py your-job-name-here
      

      Step 11: Deploy the Model

      Create deploy_model.py:

      #!/usr/bin/env python3
      # deploy_model.py
      
      import boto3
      import json
      import time
      from sagemaker.huggingface import HuggingFaceModel
      from sagemaker import Session
      
      def deploy_finetuned_model(job_name, endpoint_name=None):
          """Deploy the fine-tuned model to a SageMaker endpoint"""
      
          # Initialize
          session = Session()
          region = session.boto_region_name
      
          if not endpoint_name:
              endpoint_name = f"ft-{job_name[:30]}"  # Limit to 30 chars
      
          print(f"Deploying model from job: {job_name}")
          print(f"Endpoint name: {endpoint_name}")
          print(f"Region: {region}")
      
          # Get model artifacts location
          sm_client = boto3.client('sagemaker', region_name=region)
      
          try:
              job_info = sm_client.describe_training_job(TrainingJobName=job_name)
              model_s3_path = job_info['ModelArtifacts']['S3ModelArtifacts']
      
              print(f"Model artifacts: {model_s3_path}")
      
          except Exception as e:
              print(f"Error getting job info: {e}")
              print("Trying to find model in S3...")
      
              # Try to find model in S3
              s3_client = boto3.client('s3')
      
              # Look for output directory
              bucket = f"llama3-finetune-{job_name.split('-')[-1]}"
              prefix = f"outputs/{job_name}/"
      
              try:
                  response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
                  if 'Contents' in response:
                      for obj in response['Contents']:
                          if obj['Key'].endswith('output/model.tar.gz'):
                              model_s3_path = f"s3://{bucket}/{obj['Key']}"
                              break
              except:
                  model_s3_path = input("Enter full S3 path to model.tar.gz: ")
      
          # Create HuggingFace model
          print("\nCreating model object...")
      
          huggingface_model = HuggingFaceModel(
              model_data=model_s3_path,
              role='your-sagemaker-role-arn',  # Replace with your role
              transformers_version='4.36.0',
              pytorch_version='2.1.0',
              py_version='py310',
              env={
                  'HF_MODEL_ID': 'mistralai/Mistral-7B-Instruct-v0.1',
                  'SM_NUM_GPUS': '1',
                  'MAX_INPUT_LENGTH': '512',
                  'MAX_TOTAL_TOKENS': '1024',
              }
          )
      
          # Deploy to endpoint
          print("Deploying endpoint (this will take 5-10 minutes)...")
      
          predictor = huggingface_model.deploy(
              initial_instance_count=1,
              instance_type='ml.g5.xlarge',  # Smaller than training instance
              endpoint_name=endpoint_name,
              wait=True
          )
      
          print(f"\n✅ Endpoint deployed successfully!")
          print(f"Endpoint name: {endpoint_name}")
          print(f"Instance type: ml.g5.xlarge")
          print(f"Endpoint ARN: {predictor.endpoint}")
      
          # Test the endpoint
          print("\nTesting endpoint...")
      
          test_prompt = {
              "inputs": "### Instruction:\nHow do I reset my password?\n\n### Response:",
              "parameters": {
                  "max_new_tokens": 200,
                  "temperature": 0.7,
                  "top_p": 0.9,
                  "do_sample": True
              }
          }
      
          try:
              response = predictor.predict(test_prompt)
              print("Test response:")
              print(json.dumps(response, indent=2)[:500] + "...")
      
          except Exception as e:
              print(f"Test failed: {e}")
      
          return predictor
      
      def test_endpoint(endpoint_name):
          """Test an existing endpoint"""
          import boto3
      
          runtime = boto3.client('runtime.sagemaker')
      
          prompt = {
              "inputs": "### Instruction:\nWhat's your refund policy?\n\n### Response:",
              "parameters": {
                  "max_new_tokens": 100,
                  "temperature": 0.1  # Lower temperature for more focused responses
              }
          }
      
          response = runtime.invoke_endpoint(
              EndpointName=endpoint_name,
              ContentType='application/json',
              Body=json.dumps(prompt)
          )
      
          result = json.loads(response['Body'].read().decode())
          print("Response from endpoint:")
          print(result[0]['generated_text'])
      
          return result
      
      def cleanup(endpoint_name):
          """Delete endpoint to stop charges"""
          print(f"Deleting endpoint: {endpoint_name}")
      
          sm_client = boto3.client('sagemaker')
      
          try:
              sm_client.delete_endpoint(EndpointName=endpoint_name)
              print(f"Endpoint {endpoint_name} deleted")
      
              # Also delete endpoint config
              try:
                  endpoint_info = sm_client.describe_endpoint(EndpointName=endpoint_name)
                  config_name = endpoint_info['EndpointConfigName']
                  sm_client.delete_endpoint_config(EndpointConfigName=config_name)
                  print(f"Endpoint config {config_name} deleted")
              except:
                  pass
      
          except Exception as e:
              print(f"Error deleting endpoint: {e}")
      
      if __name__ == "__main__":
          import argparse
      
          parser = argparse.ArgumentParser(description="Deploy fine-tuned model")
          parser.add_argument("--job-name", required=True, help="Training job name")
          parser.add_argument("--endpoint-name", help="Endpoint name (optional)")
          parser.add_argument("--test", action="store_true", help="Test existing endpoint")
          parser.add_argument("--cleanup", action="store_true", help="Delete endpoint")
      
          args = parser.parse_args()
      
          if args.cleanup and args.endpoint_name:
              cleanup(args.endpoint_name)
      
          elif args.test and args.endpoint_name:
              test_endpoint(args.endpoint_name)
      
          else:
              deploy_finetuned_model(args.job_name, args.endpoint_name)
      

      Run deployment:

      # Deploy the model
      python deploy_model.py --job-name your-training-job-name
      
      # Test the endpoint
      python deploy_model.py --test --endpoint-name ft-your-job-name
      
      # Clean up (important to avoid charges!)
      python deploy_model.py --cleanup --endpoint-name ft-your-job-name
      

      Phase 5: Production Considerations

      Step 12: Create Production Setup Script

      Create production_setup.py:

      #!/usr/bin/env python3
      # production_setup.py
      
      import json
      import os
      from pathlib import Path
      
      def create_ci_cd_pipeline():
          """Create CI/CD pipeline configuration"""
      
          pipeline_config = {
              "name": "llama-finetune-pipeline",
              "stages": [
                  {
                      "name": "DataValidation",
                      "script": "scripts/validate_data.py",
                      "instance": "ml.m5.large",
                      "timeout": 1800
                  },
                  {
                      "name": "Training",
                      "script": "scripts/train.py",
                      "instance": "ml.g5.2xlarge",
                      "use_spot": True,
                      "hyperparameters": {
                          "model_name": "mistralai/Mistral-7B-Instruct-v0.1",
                          "num_epochs": 3,
                          "learning_rate": "2e-4"
                      }
                  },
                  {
                      "name": "Evaluation",
                      "script": "scripts/evaluate.py",
                      "instance": "ml.g5.xlarge",
                      "metrics": ["accuracy", "perplexity", "bleu"]
                  },
                  {
                      "name": "Deployment",
                      "condition": "evaluation.accuracy > 0.85",
                      "instance": "ml.g5.xlarge",
                      "auto_scale": {
                          "min_capacity": 1,
                          "max_capacity": 5
                      }
                  }
              ],
              "monitoring": {
                  "cloudwatch_metrics": [
                      "Invocations",
                      "ModelLatency",
                      "CPUUtilization",
                      "MemoryUtilization"
                  ],
                  "alarms": [
                      {
                          "metric": "ModelLatency",
                          "threshold": 1000,  # ms
                          "periods": 2
                      },
                      {
                          "metric": "Invocations",
                          "threshold": 1000,  # per minute
                          "periods": 5
                      }
                  ]
              },
              "cost_tracking": {
                  "daily_budget": 50,
                  "alarm_threshold": 80,
                  "report_frequency": "daily"
              }
          }
      
          # Save pipeline config
          with open('pipeline_config.json', 'w') as f:
              json.dump(pipeline_config, f, indent=2)
      
          print("✅ CI/CD pipeline configuration created")
          print("Next steps:")
          print("1. Review pipeline_config.json")
          print("2. Set up CodePipeline in AWS Console")
          print("3. Configure S3 triggers for automatic retraining")
          print("4. Set up CloudWatch alarms for monitoring")
      
          return pipeline_config
      
      def create_monitoring_dashboard():
          """Create CloudWatch dashboard configuration"""
      
          dashboard = {
              "widgets": [
                  {
                      "type": "metric",
                      "properties": {
                          "metrics": [
                              ["AWS/SageMaker", "Invocations", "EndpointName", "your-endpoint"],
                              ["AWS/SageMaker", "ModelLatency", "EndpointName", "your-endpoint"]
                          ],
                          "view": "timeSeries",
                          "stacked": False,
                          "region": "us-east-1",
                          "title": "Endpoint Performance"
                      }
                  },
                  {
                      "type": "metric",
                      "properties": {
                          "metrics": [
                              ["AWS/SageMaker", "CPUUtilization", "EndpointName", "your-endpoint"],
                              ["AWS/SageMaker", "MemoryUtilization", "EndpointName", "your-endpoint"]
                          ],
                          "view": "gauge",
                          "region": "us-east-1",
                          "title": "Resource Utilization"
                      }
                  },
                  {
                      "type": "text",
                      "properties": {
                          "markdown": "# Fine-Tuned Model Dashboard\n\n## Key Metrics\n- **Cost Today**: $12.45\n- **Total Invocations**: 12,345\n- **Avg Latency**: 245ms\n- **Error Rate**: 0.12%\n\n## Actions\n- [View Detailed Logs](https://console.aws.amazon.com/cloudwatch/home)\n- [Open SageMaker Console](https://console.aws.amazon.com/sagemaker/home)"
                      }
                  }
              ]
          }
      
          with open('dashboard_config.json', 'w') as f:
              json.dump(dashboard, f, indent=2)
      
          print("✅ Dashboard configuration created")
      
          return dashboard
      
      def create_cost_estimator():
          """Create cost estimation tool"""
      
          estimator = {
              "instance_pricing": {
                  "ml.g5.xlarge": {"on_demand": 1.212, "spot": 0.3636},
                  "ml.g5.2xlarge": {"on_demand": 2.176, "spot": 0.6528},
                  "ml.g5.4xlarge": {"on_demand": 4.352, "spot": 1.3056},
                  "ml.g5.8xlarge": {"on_demand": 8.704, "spot": 2.6112},
                  "ml.g5.12xlarge": {"on_demand": 13.056, "spot": 3.9168}
              },
              "training_estimator": {
                  "small": {"instances": "ml.g5.2xlarge", "hours": 4, "cost": 8.70},
                  "medium": {"instances": "ml.g5.4xlarge", "hours": 8, "cost": 34.82},
                  "large": {"instances": "ml.g5.8xlarge", "hours": 16, "cost": 139.26}
              },
              "inference_estimator": {
                  "low_traffic": {"instances": "ml.g5.xlarge", "hours": 24, "cost": 29.09},
                  "medium_traffic": {"instances": "ml.g5.2xlarge", "hours": 24, "cost": 52.22},
                  "high_traffic": {"instances": "ml.g5.4xlarge", "hours": 24, "cost": 104.45}
              }
          }
      
          with open('cost_estimator.json', 'w') as f:
              json.dump(estimator, f, indent=2)
      
          print("✅ Cost estimator created")
      
          # Create simple Python calculator
          calculator_code = '''
      def estimate_training_cost(instance_type, hours, use_spot=True):
          """Estimate training cost"""
          pricing = {
              "ml.g5.xlarge": 1.212,
              "ml.g5.2xlarge": 2.176,
              "ml.g5.4xlarge": 4.352,
              "ml.g5.8xlarge": 8.704,
          }
      
          hourly = pricing.get(instance_type, 2.0)
          if use_spot:
              hourly *= 0.3  # 70% discount
      
          return hourly * hours
      
      def estimate_monthly_inference(instance_type, requests_per_day, avg_latency_ms=200):
          """Estimate monthly inference cost"""
          pricing = {
              "ml.g5.xlarge": 1.212,
              "ml.g5.2xlarge": 2.176,
          }
      
          # Calculate instance hours needed
          total_processing_seconds = requests_per_day * (avg_latency_ms / 1000)
          instance_hours = total_processing_seconds / 3600
      
          # Add 20% buffer
          instance_hours *= 1.2
      
          hourly = pricing.get(instance_type, 1.5)
          daily_cost = hourly * instance_hours
          monthly_cost = daily_cost * 30
      
          return {
              "daily_cost": round(daily_cost, 2),
              "monthly_cost": round(monthly_cost, 2),
              "instance_hours_per_day": round(instance_hours, 2)
          }
      '''
      
          with open('cost_calculator.py', 'w') as f:
              f.write(calculator_code)
      
          return estimator
      
      if __name__ == "__main__":
          print("Setting up production configuration...")
          print("=" * 60)
      
          # Create all configurations
          pipeline = create_ci_cd_pipeline()
          dashboard = create_monitoring_dashboard()
          cost_config = create_cost_estimator()
      
          print("\n" + "=" * 60)
          print("PRODUCTION SETUP COMPLETE")
          print("=" * 60)
          print("\nCreated files:")
          print("1. pipeline_config.json - CI/CD pipeline configuration")
          print("2. dashboard_config.json - CloudWatch dashboard")
          print("3. cost_estimator.json - Cost estimation data")
          print("4. cost_calculator.py - Python cost calculator")
      
          print("\nNext steps for production:")
          print("1. Set up AWS Budgets with alerts")
          print("2. Configure VPC for private endpoint access")
          print("3. Set up logging to S3 for compliance")
          print("4. Implement A/B testing for model versions")
          print("5. Create automated retraining pipeline")
      

      Troubleshooting Common Issues

      Issue 1: "No space left on device"

      # Add to training script:
      training_args = TrainingArguments(
          gradient_checkpointing=True,  # Reduces memory
          gradient_accumulation_steps=4,  # Simulates larger batch
          fp16=False,  # Use bf16 instead
          bf16=True,
      )
      

      Issue 2: Training too slow

      python

      # Switch to a faster instance
      # ml.g5.2xlarge → ml.g5.4xlarge (2x faster, 2x cost)
      # Use gradient accumulation instead of larger batch size
      

      Issue 3: Model not learning

      python

      # Check your data format
      # Lower learning rate: 2e-4 → 1e-4
      # Increase epochs: 3 → 5
      # Add more diverse training examples
      

      Quick Start - One Command Setup

      Create setup.sh:

      #!/bin/bash
      # setup.sh - Complete setup script
      
      echo "🚀 Starting Llama 3 Fine-Tuning Setup..."
      echo "=========================================="
      
      # Step 1: Setup environment
      echo "1. Setting up Python environment..."
      python -m venv venv
      source venv/bin/activate
      pip install -r requirements.txt
      
      # Step 2: Prepare data
      echo "2. Preparing sample data..."
      python data/prepare_data.py
      
      # Step 3: Setup AWS (interactive)
      echo "3. Setting up AWS..."
      read -p "Enter your SageMaker Role ARN: " ROLE_ARN
      read -p "Enter S3 bucket name: " BUCKET_NAME
      
      # Step 4: Upload to S3
      echo "4. Uploading to S3..."
      aws s3 mb s3://$BUCKET_NAME
      aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/
      aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/
      
      # Step 5: Launch training
      echo "5. Launching training job..."
      python launch_training.py
      
      echo "✅ Setup complete!"
      echo "Training job launched. Check AWS Console for progress."
      

      Make it executable and run:

      bash

      chmod +x setup.sh
      ./setup.sh
      

      Summary: Your Complete Path

      1. Hour 0-1: Setup AWS, install dependencies, prepare data

      2. Hour 1-2: Configure SageMaker, upload data to S3

      3. Hour 2-3: Launch training job (runs for 2-4 hours)

      4. Hour 6-7: Check results, download model

      5. Hour 7-8: Deploy endpoint, test inference

      6. Hour 8+: Set up monitoring, CI/CD, production features

Total hands-on time: 2-3 hours
Total wait time: 2-4 hours (training) + 10-15 minutes (deployment)
Total cost: $10-50 depending on configuration


Need help? Common issues and solutions:

  1. Permission errors: Make sure your IAM role has SageMakerFullAccess

  2. Out of memory: Reduce batch size, enable gradient checkpointing

  3. Training too slow: Use larger instance or spot instances

  4. Model not loading: Check Hugging Face token for Llama 3 access

This is the complete, end-to-end guide with every single step. Copy and run each command in order, and you'll have a fine-tuned model running in production.

More from this blog