How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide

Phase 1: Setup & Preparation (30-45 minutes)
Step 1: AWS Account & Permissions Setup
1.1 Login to AWS Console
Go to https://aws.amazon.com and sign in
If new, create account (has free tier but will need payment method)
1.2 Create IAM User for SageMaker (Don't use root!)
Go to IAM Service
Click "Users" → "Create user"
Username: sagemaker-user
Select "Attach policies directly"
Add these policies:
AmazonSageMakerFullAccess
AmazonS3FullAccess
AWSCloudFormationFullAccess
IAMFullAccess (temporarily, for setup)
Click "Create user"
Go to "Security credentials" tab
Click "Create access key"
Select "Command Line Interface (CLI)"
Copy the Access Key ID and Secret Access Key

1.3 Configure AWS CLI on Your Machine
# Install AWS CLI (if not installed) # For Mac: brew install awscli # For Ubuntu: sudo apt-get install awscli # For Windows (PowerShell): winget install -e --id Amazon.AWSCLI # Configure AWS CLI aws configure # Enter: # AWS Access Key ID: [paste from step above] # AWS Secret Access Key: [paste from step above] # Default region: us-east-1 (or your preferred region) # Default output format: json1.4 Configure AWS CLI on Your Machine

Step 2: Request Model Access
2.1 Get Llama 3 Access on Hugging Face
# 1. Go to https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct # 2. Click "Request Access" # 3. Fill the form (use your real details) # 4. Wait for approval (usually within hours) # Alternative: Use a different open model that doesn't require approval # We'll use "mistralai/Mistral-7B-Instruct-v0.1" for this tutorial # No approval needed!
2.2 Create Hugging Face Token (For Llama 3 if approved)
1. Go to https://huggingface.co 2. Sign up/login 3. Click profile → Settings → Access Tokens 4. Click "New token" 5. Name: aws-sagemaker 6. Role: Write (for uploading models if needed) 7. Copy the tokenStep 3: Prepare Your Local Environment
3.1 Create Project Directory Structure
bash
mkdir llama3-finetune-tutorial cd llama3-finetune-tutorial # Create the directory structure mkdir -p scripts data configs outputs mkdir -p docker train deploy monitor3.2 Create Virtual Environment & Install Dependencies
# Create virtual environment python -m venv venv # Activate it # On Mac/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate # Install required packages pip install --upgrade pip pip install boto3 sagemaker awscli pip install transformers==4.36.0 pip install datasets==2.14.0 pip install peft==0.7.0 pip install accelerate==0.25.0 pip install bitsandbytes==0.41.3 pip install torch==2.1.0 pip install scikit-learnStep 4: Prepare Training Data
4.1 Create Sample Training Data
# Create data directory cd dataCreate
prepare_data.py:# data/prepare_data.py import json import pandas as pd def create_sample_data(): """Create example training data for customer service chatbot""" examples = [ { "instruction": "How do I reset my account password?", "input": "", "output": "To reset your password, visit our login page and click 'Forgot Password'. Enter your email address and check your inbox for a reset link. The link expires in 24 hours. If you don't receive the email, check your spam folder or contact support@ourcompany.com." }, { "instruction": "What's your refund policy?", "input": "Customer purchased item 2 weeks ago", "output": "We offer a 30-day money-back guarantee for all products. Since your purchase was 2 weeks ago, you're eligible for a full refund. Please go to 'My Orders' in your account, select the item, and click 'Request Refund'. Refunds are processed within 5-7 business days." }, { "instruction": "Do you offer international shipping?", "input": "", "output": "Yes, we ship to over 50 countries. International shipping takes 7-14 business days and costs $25. Additional customs fees may apply depending on your country's regulations. You can see exact shipping costs at checkout." }, { "instruction": "How do I contact customer support?", "input": "", "output": "You can contact our support team in three ways: 1) Email: support@ourcompany.com (response within 24 hours), 2) Live Chat: Available Mon-Fri 9AM-6PM EST on our website, 3) Phone: 1-800-555-HELP (Mon-Fri 8AM-8PM EST)." }, { "instruction": "What payment methods do you accept?", "input": "Customer is from Europe", "output": "We accept Visa, MasterCard, American Express, PayPal, and Apple Pay. For European customers, we also support SEPA transfers and iDEAL. All payments are processed securely through PCI-compliant systems." } ] # Create 100 synthetic examples (for demo - in real life, use real data) all_examples = [] base_scenarios = examples.copy() for i in range(100): base = base_scenarios[i % len(base_scenarios)] new_example = base.copy() # Add some variation if "password" in new_example["instruction"].lower(): variations = [ "I forgot my password", "Can't login to my account", "Need to change my password" ] new_example["instruction"] = variations[i % len(variations)] # Format for training text = f"### Instruction:\n{new_example['instruction']}\n\n" if new_example['input']: text += f"### Input:\n{new_example['input']}\n\n" text += f"### Response:\n{new_example['output']}" all_examples.append({"text": text}) # Save to JSON with open('train.json', 'w') as f: json.dump(all_examples, f, indent=2) # Also save in instruction format instruction_examples = [] for ex in all_examples: lines = ex['text'].split('\n') instruction = lines[0].replace('### Instruction:', '').strip() response = lines[-1].replace('### Response:', '').strip() instruction_examples.append({ "instruction": instruction, "response": response }) with open('instructions.json', 'w') as f: json.dump(instruction_examples, f, indent=2) print(f"Created {len(all_examples)} training examples") print(f"Sample: {all_examples[0]['text'][:200]}...") return all_examples if __name__ == "__main__": create_sample_data()Run it:
python prepare_data.py4.2 Create Validation Data
Createvalidation.json:[ { "text": "### Instruction:\nHow do I track my order?\n\n### Response:\nYou can track your order by logging into your account and going to 'Order History'. Click on the order number to see tracking details. You'll receive tracking emails at every major shipment milestone. For urgent inquiries, contact support@ourcompany.com." }, { "text": "### Instruction:\nDo you have a mobile app?\n\n### Input:\nCustomer uses iPhone\n\n### Response:\nYes, we have both iOS and Android apps. You can download our iOS app from the App Store by searching 'OurCompany'. The app includes all website features plus push notifications for order updates and exclusive mobile-only deals." } ]Phase 2: SageMaker Setup (20 minutes)
Step 5: Create S3 Bucket for Data & Models
5.1 Create Bucket
# Create unique bucket name (must be globally unique) BUCKET_NAME="llama3-finetune-$(date +%s)-$RANDOM" echo "Bucket name: $BUCKET_NAME" # Create bucket aws s3 mb s3://$BUCKET_NAME # Create folder structure aws s3api put-object --bucket $BUCKET_NAME --key data/train/ aws s3api put-object --bucket $BUCKET_NAME --key data/validation/ aws s3api put-object --bucket $BUCKET_NAME --key models/ aws s3api put-object --bucket $BUCKET_NAME --key outputs/5.2 Upload Data to S3
# Upload training data aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/train.json aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/validation.json # Verify upload aws s3 ls s3://$BUCKET_NAME/data/train/ aws s3 ls s3://$BUCKET_NAME/data/validation/Step 6: Create SageMaker Training Script
Create
scripts/train.py:#!/usr/bin/env python3 # scripts/train.py import os import sys import json import torch import logging from pathlib import Path # Add project root to path sys.path.append(str(Path(__file__).parent.parent)) from transformers import ( AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling, BitsAndBytesConfig ) from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training from datasets import load_dataset, Dataset import numpy as np # Set up logging logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) class LLMTrainer: def __init__(self, config_path="configs/training_config.json"): """Initialize trainer with configuration""" with open(config_path, 'r') as f: self.config = json.load(f) logger.info(f"Configuration loaded: {self.config}") # Set device self.device = "cuda" if torch.cuda.is_available() else "cpu" logger.info(f"Using device: {self.device}") def load_model_and_tokenizer(self): """Load base model and tokenizer""" logger.info(f"Loading model: {self.config['model_name']}") # Configure 4-bit quantization to save memory bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True ) # Load model with quantization self.model = AutoModelForCausalLM.from_pretrained( self.config["model_name"], quantization_config=bnb_config, device_map="auto", trust_remote_code=True, use_auth_token=True if "llama" in self.config["model_name"].lower() else False ) # Load tokenizer self.tokenizer = AutoTokenizer.from_pretrained( self.config["model_name"], trust_remote_code=True, use_auth_token=True if "llama" in self.config["model_name"].lower() else False ) # Set padding token if self.tokenizer.pad_token is None: self.tokenizer.pad_token = self.tokenizer.eos_token logger.info(f"Model loaded: {self.config['model_name']}") logger.info(f"Tokenizer vocab size: {len(self.tokenizer)}") def prepare_model_for_training(self): """Apply LoRA configuration to model""" logger.info("Preparing model for LoRA training...") # Prepare model for k-bit training self.model = prepare_model_for_kbit_training(self.model) # Configure LoRA lora_config = LoraConfig( r=self.config["lora_r"], lora_alpha=self.config["lora_alpha"], target_modules=self.config["lora_target_modules"], lora_dropout=self.config["lora_dropout"], bias="none", task_type="CAUSAL_LM" ) # Apply LoRA self.model = get_peft_model(self.model, lora_config) # Print trainable parameters self.model.print_trainable_parameters() def load_and_tokenize_data(self): """Load and tokenize training data""" logger.info("Loading training data...") # Get data paths from environment (SageMaker sets these) train_data_path = os.environ.get('SM_CHANNEL_TRAIN', 'data/train') val_data_path = os.environ.get('SM_CHANNEL_VALIDATION', 'data/validation') logger.info(f"Train data path: {train_data_path}") logger.info(f"Validation data path: {val_data_path}") # Load datasets train_files = [str(f) for f in Path(train_data_path).glob("*.json")] val_files = [str(f) for f in Path(val_data_path).glob("*.json")] train_dataset = load_dataset('json', data_files=train_files) val_dataset = load_dataset('json', data_files=val_files) if val_files else None # Tokenization function def tokenize_function(examples): return self.tokenizer( examples["text"], truncation=True, padding="max_length", max_length=self.config["max_length"] ) # Tokenize datasets tokenized_train = train_dataset.map( tokenize_function, batched=True, remove_columns=train_dataset["train"].column_names ) if val_dataset: tokenized_val = val_dataset.map( tokenize_function, batched=True, remove_columns=val_dataset["train"].column_names ) else: tokenized_val = None logger.info(f"Training samples: {len(tokenized_train['train'])}") if tokenized_val: logger.info(f"Validation samples: {len(tokenized_val['train'])}") return tokenized_train["train"], tokenized_val["train"] if tokenized_val else None def train(self): """Main training loop""" logger.info("Starting training process...") # Load model and tokenizer self.load_model_and_tokenizer() # Prepare for LoRA training self.prepare_model_for_training() # Load and tokenize data train_dataset, val_dataset = self.load_and_tokenize_data() # Create data collator data_collator = DataCollatorForLanguageModeling( tokenizer=self.tokenizer, mlm=False ) # Set output directory output_dir = "/opt/ml/model" # SageMaker expects this # Configure training arguments training_args = TrainingArguments( output_dir=output_dir, num_train_epochs=self.config["num_epochs"], per_device_train_batch_size=self.config["batch_size"], per_device_eval_batch_size=self.config["batch_size"], gradient_accumulation_steps=self.config["gradient_accumulation_steps"], warmup_steps=self.config["warmup_steps"], logging_steps=self.config["logging_steps"], save_steps=self.config["save_steps"], eval_steps=self.config["eval_steps"] if val_dataset else None, evaluation_strategy="steps" if val_dataset else "no", save_strategy="steps", save_total_limit=2, load_best_model_at_end=True if val_dataset else False, metric_for_best_model="eval_loss" if val_dataset else None, greater_is_better=False if val_dataset else None, learning_rate=self.config["learning_rate"], weight_decay=self.config["weight_decay"], fp16=False, bf16=self.config.get("bf16", False), gradient_checkpointing=self.config["gradient_checkpointing"], optim=self.config["optimizer"], report_to=["tensorboard"], ddp_find_unused_parameters=False, remove_unused_columns=False ) # Initialize Trainer trainer = Trainer( model=self.model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=data_collator, ) # Start training logger.info("Training started...") train_result = trainer.train() # Save model trainer.save_model() self.tokenizer.save_pretrained(output_dir) # Save training metrics metrics = train_result.metrics trainer.log_metrics("train", metrics) trainer.save_metrics("train", metrics) if val_dataset: eval_metrics = trainer.evaluate() trainer.log_metrics("eval", eval_metrics) trainer.save_metrics("eval", eval_metrics) logger.info(f"Training completed! Model saved to {output_dir}") return metrics def main(): """Main entry point""" try: # Check if running in SageMaker sm_training_env = os.environ.get('SM_TRAINING_ENV', '') if sm_training_env: logger.info(f"Running in SageMaker environment: {sm_training_env}") # Initialize and run trainer trainer = LLMTrainer() metrics = trainer.train() logger.info("Training completed successfully!") logger.info(f"Final metrics: {metrics}") except Exception as e: logger.error(f"Training failed with error: {str(e)}") raise if __name__ == "__main__": main()Create
configs/training_config.json:{ "model_name": "mistralai/Mistral-7B-Instruct-v0.1", "num_epochs": 3, "batch_size": 2, "gradient_accumulation_steps": 4, "learning_rate": 2e-4, "weight_decay": 0.01, "warmup_steps": 100, "logging_steps": 50, "save_steps": 100, "eval_steps": 100, "max_length": 512, "lora_r": 16, "lora_alpha": 32, "lora_dropout": 0.1, "lora_target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"], "gradient_checkpointing": true, "bf16": true, "optimizer": "adamw_8bit" }Create
scripts/requirements.txt:transformers==4.36.0 datasets==2.14.0 accelerate==0.25.0 peft==0.7.0 bitsandbytes==0.41.3 torch==2.1.0 scikit-learn sentencepiece protobuf einopsStep 7: Create SageMaker Entry Point Script
Create
scripts/sagemaker_entry.py:#!/usr/bin/env python3 # scripts/sagemaker_entry.py import os import sys import subprocess import argparse def install_requirements(): """Install required packages""" print("Installing requirements...") subprocess.check_call([ sys.executable, "-m", "pip", "install", "-r", "/opt/ml/code/requirements.txt" ]) def main(): parser = argparse.ArgumentParser() parser.add_argument( "--train", action="store_true", help="Run training" ) parser.add_argument( "--serve", action="store_true", help="Run serving" ) args = parser.parse_args() if args.train: # Install dependencies first install_requirements() # Run training print("Starting training...") from train import main as train_main train_main() elif args.serve: print("Serving mode - this would load the model for inference") # For SageMaker deployment pass if __name__ == "__main__": main()Phase 3: Launch Training (15 minutes)
Step 8: Create Launch Script
Create
launch_training.py:#!/usr/bin/env python3 # launch_training.py import os import sys import json import boto3 import time from datetime import datetime from sagemaker.huggingface import HuggingFace, get_huggingface_llm_image_uri def create_training_job(): """Create and launch SageMaker training job""" # Configuration config = { "job_name": f"llama-finetune-{datetime.now().strftime('%Y%m%d-%H%M%S')}", "instance_type": "ml.g5.2xlarge", # Cheapest GPU with enough memory "instance_count": 1, "volume_size": 200, # GB "max_run_hours": 4, "use_spot_instances": True, "max_wait_hours": 8, "bucket_name": "llama3-finetune-1234567890", # Your bucket from earlier "role_arn": None, # Will get from SageMaker } # Initialize session session = boto3.Session() sagemaker_session = boto3.Session().client('sagemaker') # Get SageMaker execution role if not config["role_arn"]: # Try to get default role try: iam = boto3.client('iam') roles = iam.list_roles(PathPrefix='/service-role/') for role in roles['Roles']: if 'AmazonSageMaker-ExecutionRole' in role['RoleName']: config["role_arn"] = role['Arn'] break except: pass if not config["role_arn"]: print("No SageMaker role found. Creating one...") # You'll need to create this through AWS Console first print("Please create a SageMaker execution role:") print("1. Go to IAM Console") print("2. Create role") print("3. Select 'SageMaker' as use case") print("4. Attach policies: AmazonSageMakerFullAccess, AmazonS3FullAccess") print("5. Name: AmazonSageMaker-ExecutionRole") print("6. Copy the ARN and paste it below") config["role_arn"] = input("Enter SageMaker Execution Role ARN: ") # Create HuggingFace estimator print(f"Creating training job: {config['job_name']}") # Hyperparameters hyperparameters = { "model_name": "mistralai/Mistral-7B-Instruct-v0.1", "num_epochs": "3", "batch_size": "2", "learning_rate": "2e-4", "lora_r": "16", } # Environment variables environment = { "HF_TOKEN": os.environ.get("HF_TOKEN", ""), # For Llama 3 access "MODEL_CACHE": "/opt/ml/model", } # Create estimator estimator = HuggingFace( entry_point="sagemaker_entry.py", source_dir="scripts", instance_type=config["instance_type"], instance_count=config["instance_count"], volume_size=config["volume_size"], role=config["role_arn"], transformers_version="4.36.0", pytorch_version="2.1.0", py_version="py310", hyperparameters=hyperparameters, environment=environment, max_run=config["max_run_hours"] * 3600, use_spot_instances=config["use_spot_instances"], max_wait=config["max_wait_hours"] * 3600 if config["use_spot_instances"] else None, output_path=f"s3://{config['bucket_name']}/outputs/", code_location=f"s3://{config['bucket_name']}/code/", disable_profiler=True, debugger_hook_config=False, ) # Define input data configuration inputs = { "train": f"s3://{config['bucket_name']}/data/train/", "validation": f"s3://{config['bucket_name']}/data/validation/", } # Launch training job print("Launching training job...") estimator.fit(inputs, job_name=config["job_name"], wait=False) # Get job details job_description = sagemaker_session.describe_training_job( TrainingJobName=config["job_name"] ) print(f"\n✅ Training job launched successfully!") print(f"Job Name: {config['job_name']}") print(f"Job ARN: {job_description['TrainingJobArn']}") print(f"Instance: {config['instance_type']}") print(f"Spot Instances: {config['use_spot_instances']}") print(f"Estimated cost: ${estimate_cost(config['instance_type'], config['max_run_hours'])}") print(f"\nMonitor job at: https://{session.region_name}.console.aws.amazon.com/sagemaker/home?region={session.region_name}#/training-jobs/{config['job_name']}") return config["job_name"] def estimate_cost(instance_type, hours): """Rough cost estimation""" pricing = { "ml.g5.2xlarge": 1.212, # per hour "ml.g5.4xlarge": 2.176, "ml.g5.8xlarge": 4.352, "ml.g5.12xlarge": 6.528, } base_cost = pricing.get(instance_type, 1.5) * hours spot_cost = base_cost * 0.3 # ~70% discount for spot return round(spot_cost, 2) def monitor_job(job_name): """Monitor training job progress""" client = boto3.client('sagemaker') print(f"\nMonitoring job: {job_name}") print("=" * 50) status = "InProgress" while status in ["InProgress", "Starting"]: try: response = client.describe_training_job(TrainingJobName=job_name) status = response['TrainingJobStatus'] if 'TrainingStartTime' in response: elapsed = (time.time() - response['TrainingStartTime'].timestamp()) / 60 print(f"Status: {status} | Elapsed: {elapsed:.1f} min", end='\r') if 'FinalMetricDataList' in response: for metric in response['FinalMetricDataList']: print(f"{metric['MetricName']}: {metric['Value']}") time.sleep(30) except Exception as e: print(f"\nError monitoring: {e}") break print(f"\nFinal Status: {status}") if status == "Completed": print("✅ Training completed successfully!") print(f"Model artifacts: {response.get('ModelArtifacts', {}).get('S3ModelArtifacts', 'N/A')}") elif status == "Failed": print("❌ Training failed!") print(f"Failure reason: {response.get('FailureReason', 'Unknown')}") return status def main(): """Main function""" print("=" * 60) print("Llama 3 Fine-Tuning on SageMaker - Launch Script") print("=" * 60) # Step 1: Create training job job_name = create_training_job() # Step 2: Ask if user wants to monitor monitor = input("\nDo you want to monitor the job? (yes/no): ").lower() if monitor in ['yes', 'y']: monitor_job(job_name) # Step 3: Show next steps print("\n" + "=" * 60) print("NEXT STEPS:") print("=" * 60) print("1. Wait for training to complete (2-4 hours)") print("2. Check S3 for model artifacts:") print(f" aws s3 ls s3://llama3-finetune-*/outputs/{job_name}/") print("3. Deploy the model:") print(" python deploy_model.py --job-name " + job_name) print("\nTo check status manually:") print(f" aws sagemaker describe-training-job --training-job-name {job_name}") if __name__ == "__main__": main()Step 9: Run the Training!
# Make scripts executable chmod +x launch_training.py chmod +x scripts/*.py # Run the launch script python launch_training.py # Or run directly with minimal setup python -c " import boto3 from sagemaker.huggingface import HuggingFace # Quick start - minimal configuration estimator = HuggingFace( entry_point='train.py', source_dir='scripts', instance_type='ml.g5.2xlarge', instance_count=1, role='your-sagemaker-role-arn', # Replace with your role transformers_version='4.36', pytorch_version='2.1', py_version='py310', hyperparameters={ 'model_name': 'mistralai/Mistral-7B-Instruct-v0.1', 'num_epochs': 1, # Start with 1 epoch for testing } ) # Start training estimator.fit({ 'train': 's3://your-bucket/data/train/', 'validation': 's3://your-bucket/data/validation/' }, wait=True) "Phase 4: Monitor & Deploy (After Training Completes)
Step 10: Check Training Results
Create
check_results.py:#!/usr/bin/env python3 # check_results.py import boto3 import json from datetime import datetime def check_training_job(job_name): """Check training job status and results""" client = boto3.client('sagemaker') try: response = client.describe_training_job(TrainingJobName=job_name) print(f"Job Name: {response['TrainingJobName']}") print(f"Status: {response['TrainingJobStatus']}") print(f"Creation Time: {response['CreationTime']}") if 'TrainingEndTime' in response: print(f"End Time: {response['TrainingEndTime']}") duration = (response['TrainingEndTime'] - response['TrainingStartTime']).total_seconds() / 3600 print(f"Duration: {duration:.2f} hours") if 'ModelArtifacts' in response: print(f"\nModel Artifacts: {response['ModelArtifacts']['S3ModelArtifacts']}") if 'FinalMetricDataList' in response: print("\nFinal Metrics:") for metric in response['FinalMetricDataList']: print(f" {metric['MetricName']}: {metric['Value']:.4f}") # Check for Spot training savings if response.get('EnableManagedSpotTraining', False): billable_time = response.get('BillableTimeInSeconds', 0) total_time = response.get('TrainingTimeInSeconds', 0) if total_time > 0: savings = (1 - (billable_time / total_time)) * 100 print(f"\nSpot Training Savings: {savings:.1f}%") print(f"Billable time: {billable_time/3600:.1f}h") print(f"Total time: {total_time/3600:.1f}h") # Estimate cost instance_type = response['ResourceConfig']['InstanceType'] duration_hours = response.get('TrainingTimeInSeconds', 0) / 3600 # Rough pricing (varies by region) pricing = { 'ml.g5.2xlarge': 1.212, 'ml.g5.4xlarge': 2.176, 'ml.g5.8xlarge': 4.352, } hourly_rate = pricing.get(instance_type, 1.5) cost = hourly_rate * duration_hours if response.get('EnableManagedSpotTraining', False): cost *= 0.3 # ~70% discount print(f"\nEstimated Cost: ${cost:.2f}") return response except Exception as e: print(f"Error: {e}") return None def download_model(job_name, local_dir="model_output"): """Download trained model from S3""" import os from urllib.parse import urlparse import tarfile # Get model artifacts location client = boto3.client('sagemaker') response = client.describe_training_job(TrainingJobName=job_name) if 'ModelArtifacts' not in response: print("No model artifacts found") return None s3_path = response['ModelArtifacts']['S3ModelArtifacts'] # Parse S3 URL parsed = urlparse(s3_path) bucket = parsed.netloc key = parsed.path.lstrip('/') # Create local directory os.makedirs(local_dir, exist_ok=True) # Download file local_file = os.path.join(local_dir, 'model.tar.gz') print(f"Downloading model from s3://{bucket}/{key}") print(f"To: {local_file}") s3 = boto3.client('s3') s3.download_file(bucket, key, local_file) # Extract if it's a tar file if local_file.endswith('.tar.gz'): print("Extracting model...") with tarfile.open(local_file, 'r:gz') as tar: tar.extractall(path=local_dir) # Remove tar file os.remove(local_file) print(f"Model downloaded to: {local_dir}") # List contents print("\nModel contents:") for root, dirs, files in os.walk(local_dir): for file in files[:10]: # Show first 10 files print(f" {os.path.join(root, file)}") return local_dir if __name__ == "__main__": import sys if len(sys.argv) > 1: job_name = sys.argv[1] else: job_name = input("Enter training job name: ") print(f"Checking job: {job_name}") print("=" * 60) result = check_training_job(job_name) if result and result['TrainingJobStatus'] == 'Completed': download = input("\nDownload model? (yes/no): ").lower() if download in ['yes', 'y']: download_model(job_name)Run it:
# After training completes python check_results.py your-job-name-hereStep 11: Deploy the Model
Create
deploy_model.py:#!/usr/bin/env python3 # deploy_model.py import boto3 import json import time from sagemaker.huggingface import HuggingFaceModel from sagemaker import Session def deploy_finetuned_model(job_name, endpoint_name=None): """Deploy the fine-tuned model to a SageMaker endpoint""" # Initialize session = Session() region = session.boto_region_name if not endpoint_name: endpoint_name = f"ft-{job_name[:30]}" # Limit to 30 chars print(f"Deploying model from job: {job_name}") print(f"Endpoint name: {endpoint_name}") print(f"Region: {region}") # Get model artifacts location sm_client = boto3.client('sagemaker', region_name=region) try: job_info = sm_client.describe_training_job(TrainingJobName=job_name) model_s3_path = job_info['ModelArtifacts']['S3ModelArtifacts'] print(f"Model artifacts: {model_s3_path}") except Exception as e: print(f"Error getting job info: {e}") print("Trying to find model in S3...") # Try to find model in S3 s3_client = boto3.client('s3') # Look for output directory bucket = f"llama3-finetune-{job_name.split('-')[-1]}" prefix = f"outputs/{job_name}/" try: response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix) if 'Contents' in response: for obj in response['Contents']: if obj['Key'].endswith('output/model.tar.gz'): model_s3_path = f"s3://{bucket}/{obj['Key']}" break except: model_s3_path = input("Enter full S3 path to model.tar.gz: ") # Create HuggingFace model print("\nCreating model object...") huggingface_model = HuggingFaceModel( model_data=model_s3_path, role='your-sagemaker-role-arn', # Replace with your role transformers_version='4.36.0', pytorch_version='2.1.0', py_version='py310', env={ 'HF_MODEL_ID': 'mistralai/Mistral-7B-Instruct-v0.1', 'SM_NUM_GPUS': '1', 'MAX_INPUT_LENGTH': '512', 'MAX_TOTAL_TOKENS': '1024', } ) # Deploy to endpoint print("Deploying endpoint (this will take 5-10 minutes)...") predictor = huggingface_model.deploy( initial_instance_count=1, instance_type='ml.g5.xlarge', # Smaller than training instance endpoint_name=endpoint_name, wait=True ) print(f"\n✅ Endpoint deployed successfully!") print(f"Endpoint name: {endpoint_name}") print(f"Instance type: ml.g5.xlarge") print(f"Endpoint ARN: {predictor.endpoint}") # Test the endpoint print("\nTesting endpoint...") test_prompt = { "inputs": "### Instruction:\nHow do I reset my password?\n\n### Response:", "parameters": { "max_new_tokens": 200, "temperature": 0.7, "top_p": 0.9, "do_sample": True } } try: response = predictor.predict(test_prompt) print("Test response:") print(json.dumps(response, indent=2)[:500] + "...") except Exception as e: print(f"Test failed: {e}") return predictor def test_endpoint(endpoint_name): """Test an existing endpoint""" import boto3 runtime = boto3.client('runtime.sagemaker') prompt = { "inputs": "### Instruction:\nWhat's your refund policy?\n\n### Response:", "parameters": { "max_new_tokens": 100, "temperature": 0.1 # Lower temperature for more focused responses } } response = runtime.invoke_endpoint( EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(prompt) ) result = json.loads(response['Body'].read().decode()) print("Response from endpoint:") print(result[0]['generated_text']) return result def cleanup(endpoint_name): """Delete endpoint to stop charges""" print(f"Deleting endpoint: {endpoint_name}") sm_client = boto3.client('sagemaker') try: sm_client.delete_endpoint(EndpointName=endpoint_name) print(f"Endpoint {endpoint_name} deleted") # Also delete endpoint config try: endpoint_info = sm_client.describe_endpoint(EndpointName=endpoint_name) config_name = endpoint_info['EndpointConfigName'] sm_client.delete_endpoint_config(EndpointConfigName=config_name) print(f"Endpoint config {config_name} deleted") except: pass except Exception as e: print(f"Error deleting endpoint: {e}") if __name__ == "__main__": import argparse parser = argparse.ArgumentParser(description="Deploy fine-tuned model") parser.add_argument("--job-name", required=True, help="Training job name") parser.add_argument("--endpoint-name", help="Endpoint name (optional)") parser.add_argument("--test", action="store_true", help="Test existing endpoint") parser.add_argument("--cleanup", action="store_true", help="Delete endpoint") args = parser.parse_args() if args.cleanup and args.endpoint_name: cleanup(args.endpoint_name) elif args.test and args.endpoint_name: test_endpoint(args.endpoint_name) else: deploy_finetuned_model(args.job_name, args.endpoint_name)Run deployment:
# Deploy the model python deploy_model.py --job-name your-training-job-name # Test the endpoint python deploy_model.py --test --endpoint-name ft-your-job-name # Clean up (important to avoid charges!) python deploy_model.py --cleanup --endpoint-name ft-your-job-namePhase 5: Production Considerations
Step 12: Create Production Setup Script
Create
production_setup.py:#!/usr/bin/env python3 # production_setup.py import json import os from pathlib import Path def create_ci_cd_pipeline(): """Create CI/CD pipeline configuration""" pipeline_config = { "name": "llama-finetune-pipeline", "stages": [ { "name": "DataValidation", "script": "scripts/validate_data.py", "instance": "ml.m5.large", "timeout": 1800 }, { "name": "Training", "script": "scripts/train.py", "instance": "ml.g5.2xlarge", "use_spot": True, "hyperparameters": { "model_name": "mistralai/Mistral-7B-Instruct-v0.1", "num_epochs": 3, "learning_rate": "2e-4" } }, { "name": "Evaluation", "script": "scripts/evaluate.py", "instance": "ml.g5.xlarge", "metrics": ["accuracy", "perplexity", "bleu"] }, { "name": "Deployment", "condition": "evaluation.accuracy > 0.85", "instance": "ml.g5.xlarge", "auto_scale": { "min_capacity": 1, "max_capacity": 5 } } ], "monitoring": { "cloudwatch_metrics": [ "Invocations", "ModelLatency", "CPUUtilization", "MemoryUtilization" ], "alarms": [ { "metric": "ModelLatency", "threshold": 1000, # ms "periods": 2 }, { "metric": "Invocations", "threshold": 1000, # per minute "periods": 5 } ] }, "cost_tracking": { "daily_budget": 50, "alarm_threshold": 80, "report_frequency": "daily" } } # Save pipeline config with open('pipeline_config.json', 'w') as f: json.dump(pipeline_config, f, indent=2) print("✅ CI/CD pipeline configuration created") print("Next steps:") print("1. Review pipeline_config.json") print("2. Set up CodePipeline in AWS Console") print("3. Configure S3 triggers for automatic retraining") print("4. Set up CloudWatch alarms for monitoring") return pipeline_config def create_monitoring_dashboard(): """Create CloudWatch dashboard configuration""" dashboard = { "widgets": [ { "type": "metric", "properties": { "metrics": [ ["AWS/SageMaker", "Invocations", "EndpointName", "your-endpoint"], ["AWS/SageMaker", "ModelLatency", "EndpointName", "your-endpoint"] ], "view": "timeSeries", "stacked": False, "region": "us-east-1", "title": "Endpoint Performance" } }, { "type": "metric", "properties": { "metrics": [ ["AWS/SageMaker", "CPUUtilization", "EndpointName", "your-endpoint"], ["AWS/SageMaker", "MemoryUtilization", "EndpointName", "your-endpoint"] ], "view": "gauge", "region": "us-east-1", "title": "Resource Utilization" } }, { "type": "text", "properties": { "markdown": "# Fine-Tuned Model Dashboard\n\n## Key Metrics\n- **Cost Today**: $12.45\n- **Total Invocations**: 12,345\n- **Avg Latency**: 245ms\n- **Error Rate**: 0.12%\n\n## Actions\n- [View Detailed Logs](https://console.aws.amazon.com/cloudwatch/home)\n- [Open SageMaker Console](https://console.aws.amazon.com/sagemaker/home)" } } ] } with open('dashboard_config.json', 'w') as f: json.dump(dashboard, f, indent=2) print("✅ Dashboard configuration created") return dashboard def create_cost_estimator(): """Create cost estimation tool""" estimator = { "instance_pricing": { "ml.g5.xlarge": {"on_demand": 1.212, "spot": 0.3636}, "ml.g5.2xlarge": {"on_demand": 2.176, "spot": 0.6528}, "ml.g5.4xlarge": {"on_demand": 4.352, "spot": 1.3056}, "ml.g5.8xlarge": {"on_demand": 8.704, "spot": 2.6112}, "ml.g5.12xlarge": {"on_demand": 13.056, "spot": 3.9168} }, "training_estimator": { "small": {"instances": "ml.g5.2xlarge", "hours": 4, "cost": 8.70}, "medium": {"instances": "ml.g5.4xlarge", "hours": 8, "cost": 34.82}, "large": {"instances": "ml.g5.8xlarge", "hours": 16, "cost": 139.26} }, "inference_estimator": { "low_traffic": {"instances": "ml.g5.xlarge", "hours": 24, "cost": 29.09}, "medium_traffic": {"instances": "ml.g5.2xlarge", "hours": 24, "cost": 52.22}, "high_traffic": {"instances": "ml.g5.4xlarge", "hours": 24, "cost": 104.45} } } with open('cost_estimator.json', 'w') as f: json.dump(estimator, f, indent=2) print("✅ Cost estimator created") # Create simple Python calculator calculator_code = ''' def estimate_training_cost(instance_type, hours, use_spot=True): """Estimate training cost""" pricing = { "ml.g5.xlarge": 1.212, "ml.g5.2xlarge": 2.176, "ml.g5.4xlarge": 4.352, "ml.g5.8xlarge": 8.704, } hourly = pricing.get(instance_type, 2.0) if use_spot: hourly *= 0.3 # 70% discount return hourly * hours def estimate_monthly_inference(instance_type, requests_per_day, avg_latency_ms=200): """Estimate monthly inference cost""" pricing = { "ml.g5.xlarge": 1.212, "ml.g5.2xlarge": 2.176, } # Calculate instance hours needed total_processing_seconds = requests_per_day * (avg_latency_ms / 1000) instance_hours = total_processing_seconds / 3600 # Add 20% buffer instance_hours *= 1.2 hourly = pricing.get(instance_type, 1.5) daily_cost = hourly * instance_hours monthly_cost = daily_cost * 30 return { "daily_cost": round(daily_cost, 2), "monthly_cost": round(monthly_cost, 2), "instance_hours_per_day": round(instance_hours, 2) } ''' with open('cost_calculator.py', 'w') as f: f.write(calculator_code) return estimator if __name__ == "__main__": print("Setting up production configuration...") print("=" * 60) # Create all configurations pipeline = create_ci_cd_pipeline() dashboard = create_monitoring_dashboard() cost_config = create_cost_estimator() print("\n" + "=" * 60) print("PRODUCTION SETUP COMPLETE") print("=" * 60) print("\nCreated files:") print("1. pipeline_config.json - CI/CD pipeline configuration") print("2. dashboard_config.json - CloudWatch dashboard") print("3. cost_estimator.json - Cost estimation data") print("4. cost_calculator.py - Python cost calculator") print("\nNext steps for production:") print("1. Set up AWS Budgets with alerts") print("2. Configure VPC for private endpoint access") print("3. Set up logging to S3 for compliance") print("4. Implement A/B testing for model versions") print("5. Create automated retraining pipeline")Troubleshooting Common Issues
Issue 1: "No space left on device"
# Add to training script: training_args = TrainingArguments( gradient_checkpointing=True, # Reduces memory gradient_accumulation_steps=4, # Simulates larger batch fp16=False, # Use bf16 instead bf16=True, )Issue 2: Training too slow
python
# Switch to a faster instance # ml.g5.2xlarge → ml.g5.4xlarge (2x faster, 2x cost) # Use gradient accumulation instead of larger batch sizeIssue 3: Model not learning
python
# Check your data format # Lower learning rate: 2e-4 → 1e-4 # Increase epochs: 3 → 5 # Add more diverse training examplesQuick Start - One Command Setup
Create
setup.sh:#!/bin/bash # setup.sh - Complete setup script echo "🚀 Starting Llama 3 Fine-Tuning Setup..." echo "==========================================" # Step 1: Setup environment echo "1. Setting up Python environment..." python -m venv venv source venv/bin/activate pip install -r requirements.txt # Step 2: Prepare data echo "2. Preparing sample data..." python data/prepare_data.py # Step 3: Setup AWS (interactive) echo "3. Setting up AWS..." read -p "Enter your SageMaker Role ARN: " ROLE_ARN read -p "Enter S3 bucket name: " BUCKET_NAME # Step 4: Upload to S3 echo "4. Uploading to S3..." aws s3 mb s3://$BUCKET_NAME aws s3 cp data/train.json s3://$BUCKET_NAME/data/train/ aws s3 cp data/validation.json s3://$BUCKET_NAME/data/validation/ # Step 5: Launch training echo "5. Launching training job..." python launch_training.py echo "✅ Setup complete!" echo "Training job launched. Check AWS Console for progress."Make it executable and run:
bash
chmod +x setup.sh ./setup.sh
Summary: Your Complete Path
Hour 0-1: Setup AWS, install dependencies, prepare data
Hour 1-2: Configure SageMaker, upload data to S3
Hour 2-3: Launch training job (runs for 2-4 hours)
Hour 6-7: Check results, download model
Hour 7-8: Deploy endpoint, test inference
Hour 8+: Set up monitoring, CI/CD, production features
Total hands-on time: 2-3 hours
Total wait time: 2-4 hours (training) + 10-15 minutes (deployment)
Total cost: $10-50 depending on configuration
Need help? Common issues and solutions:
Permission errors: Make sure your IAM role has SageMakerFullAccess
Out of memory: Reduce batch size, enable gradient checkpointing
Training too slow: Use larger instance or spot instances
Model not loading: Check Hugging Face token for Llama 3 access
This is the complete, end-to-end guide with every single step. Copy and run each command in order, and you'll have a fine-tuned model running in production.




