<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Om Thakur Blogs]]></title><description><![CDATA[Cloud Engineer | AWS Solution Architect Certified | 4X AWS Certified | Cloud Instructor]]></description><link>https://blog.omprakashthakur.com.np</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 20:55:10 GMT</lastBuildDate><atom:link href="https://blog.omprakashthakur.com.np/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[How to Fine-Tune Llama 3 on AWS Without Breaking the Bank: A Practical Guide]]></title><description><![CDATA[Phase 1: Setup & Preparation (30-45 minutes)
Step 1: AWS Account & Permissions Setup
1.1 Login to AWS Console

Go to https://aws.amazon.com and sign in

If new, create account (has free tier but will need payment method)

1.2 Create IAM User for Sage...]]></description><link>https://blog.omprakashthakur.com.np/how-to-fine-tune-llama-3-on-aws-without-breaking-the-bank-a-practical-guide</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/how-to-fine-tune-llama-3-on-aws-without-breaking-the-bank-a-practical-guide</guid><category><![CDATA[primary_tags: ["AWS SageMaker", "Fine-Tuning", "Llama 3", "Cost Optimization", "LLM"] secondary_tags: ["LoRA", "Spot Instances", "Model Training", "AWS Cost", "Open Source AI", "Mistral", "Hugging Face"] long_tail_tags: ["fine-tune llama3 on aws", "sagemaker training cost", "aws spot instances savings", "lora fine-tuning tutorial", "custom ai model cheap", "aws gpu cost optimization", "train llm on budget"]]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Fri, 02 Jan 2026 06:07:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767262301074/d82c5744-6555-4e13-850d-15e4739e1717.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-phase-1-setup-amp-preparation-30-45-minutes"><strong>Phase 1: Setup &amp; Preparation (30-45 minutes)</strong></h2>
<h3 id="heading-step-1-aws-account-amp-permissions-setup"><strong>Step 1: AWS Account &amp; Permissions Setup</strong></h3>
<p><strong>1.1 Login to AWS Console</strong></p>
<ul>
<li><p>Go to <a target="_blank" href="https://aws.amazon.com/">https://aws.amazon.com</a> and sign in</p>
</li>
<li><p>If new, create account (has free tier but will need payment method)</p>
</li>
<li><p>1.2 Create IAM User for SageMaker (Don't use root!)</p>
<ol>
<li><p>Go to IAM Service</p>
</li>
<li><p>Click "Users" → "Create user"</p>
</li>
<li><p>Username: sagemaker-user</p>
</li>
<li><p>Select "Attach policies directly"</p>
</li>
<li><p>Add these policies:</p>
<ul>
<li><p>AmazonSageMakerFullAccess</p>
</li>
<li><p>AmazonS3FullAccess</p>
</li>
<li><p>AWSCloudFormationFullAccess</p>
</li>
<li><p>IAMFullAccess (temporarily, for setup)</p>
</li>
</ul>
</li>
<li><p>Click "Create user"</p>
</li>
<li><p>Go to "Security credentials" tab</p>
</li>
<li><p>Click "Create access key"</p>
</li>
<li><p>Select "Command Line Interface (CLI)"</p>
</li>
<li><p>Copy the Access Key ID and Secret Access Key</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767332238034/7f20663d-25d5-47fe-bc71-96f44b91ed90.png" alt class="image--center mx-auto" /></p>
<p>1.3 Configure AWS CLI on Your Machine</p>
<pre><code class="lang-plaintext"># Install AWS CLI (if not installed)
# For Mac:
brew install awscli
# For Ubuntu:
sudo apt-get install awscli
# For Windows (PowerShell):
winget install -e --id Amazon.AWSCLI

# Configure AWS CLI
aws configure
# Enter:
# AWS Access Key ID: [paste from step above]
# AWS Secret Access Key: [paste from step above]
# Default region: us-east-1 (or your preferred region)
# Default output format: json
</code></pre>
<p>1.4 Configure AWS CLI on Your Machine</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767332337494/7c156b18-94e3-467c-b4d1-fd698af5447d.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-step-2-request-model-access"><strong>Step 2: Request Model Access</strong></h3>
<p><strong>2.1 Get Llama 3 Access on Hugging Face</strong></p>
<pre><code class="lang-plaintext"># 1. Go to https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
# 2. Click "Request Access"
# 3. Fill the form (use your real details)
# 4. Wait for approval (usually within hours)

# Alternative: Use a different open model that doesn't require approval
# We'll use "mistralai/Mistral-7B-Instruct-v0.1" for this tutorial
# No approval needed!
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767332499118/bda85a0a-c766-4216-8f6a-4d69870e5a35.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><strong>2.2 Create Hugging Face Token (For Llama 3 if approved)</strong></p>
<pre><code class="lang-bash">1. Go to https://huggingface.co
2. Sign up/login
3. Click profile → Settings → Access Tokens
4. Click <span class="hljs-string">"New token"</span>
5. Name: aws-sagemaker
6. Role: Write (<span class="hljs-keyword">for</span> uploading models <span class="hljs-keyword">if</span> needed)
7. Copy the token
</code></pre>
<h3 id="heading-step-3-prepare-your-local-environment"><strong>Step 3: Prepare Your Local Environment</strong></h3>
<p><strong>3.1 Create Project Directory Structure</strong></p>
<p>bash</p>
<pre><code class="lang-bash">mkdir llama3-finetune-tutorial
<span class="hljs-built_in">cd</span> llama3-finetune-tutorial

<span class="hljs-comment"># Create the directory structure</span>
mkdir -p scripts data configs outputs
mkdir -p docker train deploy monitor
</code></pre>
<p><strong>3.2 Create Virtual Environment &amp; Install Dependencies</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create virtual environment</span>
python -m venv venv

<span class="hljs-comment"># Activate it</span>
<span class="hljs-comment"># On Mac/Linux:</span>
<span class="hljs-built_in">source</span> venv/bin/activate

<span class="hljs-comment"># On Windows:</span>
<span class="hljs-comment"># venv\Scripts\activate</span>

<span class="hljs-comment"># Install required packages</span>
pip install --upgrade pip
pip install boto3 sagemaker awscli
pip install transformers==4.36.0
pip install datasets==2.14.0
pip install peft==0.7.0
pip install accelerate==0.25.0
pip install bitsandbytes==0.41.3
pip install torch==2.1.0
pip install scikit-learn
</code></pre>
<h3 id="heading-step-4-prepare-training-data"><strong>Step 4: Prepare Training Data</strong></h3>
<p><strong>4.1 Create Sample Training Data</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create data directory</span>
<span class="hljs-built_in">cd</span> data
</code></pre>
<p>Create <code>prepare_</code><a target="_blank" href="http://data.py"><code>data.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment"># data/prepare_data.py</span>
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_sample_data</span>():</span>
    <span class="hljs-string">"""Create example training data for customer service chatbot"""</span>

    examples = [
        {
            <span class="hljs-string">"instruction"</span>: <span class="hljs-string">"How do I reset my account password?"</span>,
            <span class="hljs-string">"input"</span>: <span class="hljs-string">""</span>,
            <span class="hljs-string">"output"</span>: <span class="hljs-string">"To reset your password, visit our login page and click 'Forgot Password'. Enter your email address and check your inbox for a reset link. The link expires in 24 hours. If you don't receive the email, check your spam folder or contact support@ourcompany.com."</span>
        },
        {
            <span class="hljs-string">"instruction"</span>: <span class="hljs-string">"What's your refund policy?"</span>,
            <span class="hljs-string">"input"</span>: <span class="hljs-string">"Customer purchased item 2 weeks ago"</span>,
            <span class="hljs-string">"output"</span>: <span class="hljs-string">"We offer a 30-day money-back guarantee for all products. Since your purchase was 2 weeks ago, you're eligible for a full refund. Please go to 'My Orders' in your account, select the item, and click 'Request Refund'. Refunds are processed within 5-7 business days."</span>
        },
        {
            <span class="hljs-string">"instruction"</span>: <span class="hljs-string">"Do you offer international shipping?"</span>,
            <span class="hljs-string">"input"</span>: <span class="hljs-string">""</span>,
            <span class="hljs-string">"output"</span>: <span class="hljs-string">"Yes, we ship to over 50 countries. International shipping takes 7-14 business days and costs $25. Additional customs fees may apply depending on your country's regulations. You can see exact shipping costs at checkout."</span>
        },
        {
            <span class="hljs-string">"instruction"</span>: <span class="hljs-string">"How do I contact customer support?"</span>,
            <span class="hljs-string">"input"</span>: <span class="hljs-string">""</span>,
            <span class="hljs-string">"output"</span>: <span class="hljs-string">"You can contact our support team in three ways: 1) Email: support@ourcompany.com (response within 24 hours), 2) Live Chat: Available Mon-Fri 9AM-6PM EST on our website, 3) Phone: 1-800-555-HELP (Mon-Fri 8AM-8PM EST)."</span>
        },
        {
            <span class="hljs-string">"instruction"</span>: <span class="hljs-string">"What payment methods do you accept?"</span>,
            <span class="hljs-string">"input"</span>: <span class="hljs-string">"Customer is from Europe"</span>,
            <span class="hljs-string">"output"</span>: <span class="hljs-string">"We accept Visa, MasterCard, American Express, PayPal, and Apple Pay. For European customers, we also support SEPA transfers and iDEAL. All payments are processed securely through PCI-compliant systems."</span>
        }
    ]

    <span class="hljs-comment"># Create 100 synthetic examples (for demo - in real life, use real data)</span>
    all_examples = []
    base_scenarios = examples.copy()

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(<span class="hljs-number">100</span>):
        base = base_scenarios[i % len(base_scenarios)]
        new_example = base.copy()

        <span class="hljs-comment"># Add some variation</span>
        <span class="hljs-keyword">if</span> <span class="hljs-string">"password"</span> <span class="hljs-keyword">in</span> new_example[<span class="hljs-string">"instruction"</span>].lower():
            variations = [
                <span class="hljs-string">"I forgot my password"</span>,
                <span class="hljs-string">"Can't login to my account"</span>,
                <span class="hljs-string">"Need to change my password"</span>
            ]
            new_example[<span class="hljs-string">"instruction"</span>] = variations[i % len(variations)]

        <span class="hljs-comment"># Format for training</span>
        text = <span class="hljs-string">f"### Instruction:\n<span class="hljs-subst">{new_example[<span class="hljs-string">'instruction'</span>]}</span>\n\n"</span>
        <span class="hljs-keyword">if</span> new_example[<span class="hljs-string">'input'</span>]:
            text += <span class="hljs-string">f"### Input:\n<span class="hljs-subst">{new_example[<span class="hljs-string">'input'</span>]}</span>\n\n"</span>
        text += <span class="hljs-string">f"### Response:\n<span class="hljs-subst">{new_example[<span class="hljs-string">'output'</span>]}</span>"</span>

        all_examples.append({<span class="hljs-string">"text"</span>: text})

    <span class="hljs-comment"># Save to JSON</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'train.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        json.dump(all_examples, f, indent=<span class="hljs-number">2</span>)

    <span class="hljs-comment"># Also save in instruction format</span>
    instruction_examples = []
    <span class="hljs-keyword">for</span> ex <span class="hljs-keyword">in</span> all_examples:
        lines = ex[<span class="hljs-string">'text'</span>].split(<span class="hljs-string">'\n'</span>)
        instruction = lines[<span class="hljs-number">0</span>].replace(<span class="hljs-string">'### Instruction:'</span>, <span class="hljs-string">''</span>).strip()
        response = lines[<span class="hljs-number">-1</span>].replace(<span class="hljs-string">'### Response:'</span>, <span class="hljs-string">''</span>).strip()
        instruction_examples.append({
            <span class="hljs-string">"instruction"</span>: instruction,
            <span class="hljs-string">"response"</span>: response
        })

    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'instructions.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        json.dump(instruction_examples, f, indent=<span class="hljs-number">2</span>)

    print(<span class="hljs-string">f"Created <span class="hljs-subst">{len(all_examples)}</span> training examples"</span>)
    print(<span class="hljs-string">f"Sample: <span class="hljs-subst">{all_examples[<span class="hljs-number">0</span>][<span class="hljs-string">'text'</span>][:<span class="hljs-number">200</span>]}</span>..."</span>)

    <span class="hljs-keyword">return</span> all_examples

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    create_sample_data()
</code></pre>
<p>Run it:</p>
<pre><code class="lang-bash">python prepare_data.py
</code></pre>
<p><strong>4.2 Create Validation Data</strong><br />Create <code>validation.json</code>:</p>
<pre><code class="lang-json">[
  {
    <span class="hljs-attr">"text"</span>: <span class="hljs-string">"### Instruction:\nHow do I track my order?\n\n### Response:\nYou can track your order by logging into your account and going to 'Order History'. Click on the order number to see tracking details. You'll receive tracking emails at every major shipment milestone. For urgent inquiries, contact support@ourcompany.com."</span>
  },
  {
    <span class="hljs-attr">"text"</span>: <span class="hljs-string">"### Instruction:\nDo you have a mobile app?\n\n### Input:\nCustomer uses iPhone\n\n### Response:\nYes, we have both iOS and Android apps. You can download our iOS app from the App Store by searching 'OurCompany'. The app includes all website features plus push notifications for order updates and exclusive mobile-only deals."</span>
  }
]
</code></pre>
<h2 id="heading-phase-2-sagemaker-setup-20-minutes"><strong>Phase 2: SageMaker Setup (20 minutes)</strong></h2>
<h3 id="heading-step-5-create-s3-bucket-for-data-amp-models"><strong>Step 5: Create S3 Bucket for Data &amp; Models</strong></h3>
<p><strong>5.1 Create Bucket</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create unique bucket name (must be globally unique)</span>
BUCKET_NAME=<span class="hljs-string">"llama3-finetune-<span class="hljs-subst">$(date +%s)</span>-<span class="hljs-variable">$RANDOM</span>"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Bucket name: <span class="hljs-variable">$BUCKET_NAME</span>"</span>

<span class="hljs-comment"># Create bucket</span>
aws s3 mb s3://<span class="hljs-variable">$BUCKET_NAME</span>

<span class="hljs-comment"># Create folder structure</span>
aws s3api put-object --bucket <span class="hljs-variable">$BUCKET_NAME</span> --key data/train/
aws s3api put-object --bucket <span class="hljs-variable">$BUCKET_NAME</span> --key data/validation/
aws s3api put-object --bucket <span class="hljs-variable">$BUCKET_NAME</span> --key models/
aws s3api put-object --bucket <span class="hljs-variable">$BUCKET_NAME</span> --key outputs/
</code></pre>
<p><strong>5.2 Upload Data to S3</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Upload training data</span>
aws s3 cp data/train.json s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/train/train.json
aws s3 cp data/validation.json s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/validation/validation.json

<span class="hljs-comment"># Verify upload</span>
aws s3 ls s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/train/
aws s3 ls s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/validation/
</code></pre>
<h3 id="heading-step-6-create-sagemaker-training-script"><strong>Step 6: Create SageMaker Training Script</strong></h3>
<p>Create <code>scripts/</code><a target="_blank" href="http://train.py"><code>train.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># scripts/train.py</span>

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path

<span class="hljs-comment"># Add project root to path</span>
sys.path.append(str(Path(__file__).parent.parent))

<span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> (
    AutoModelForCausalLM,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
<span class="hljs-keyword">from</span> peft <span class="hljs-keyword">import</span> LoraConfig, get_peft_model, prepare_model_for_kbit_training
<span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset, Dataset
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Set up logging</span>
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">LLMTrainer</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, config_path=<span class="hljs-string">"configs/training_config.json"</span></span>):</span>
        <span class="hljs-string">"""Initialize trainer with configuration"""</span>
        <span class="hljs-keyword">with</span> open(config_path, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> f:
            self.config = json.load(f)

        logger.info(<span class="hljs-string">f"Configuration loaded: <span class="hljs-subst">{self.config}</span>"</span>)

        <span class="hljs-comment"># Set device</span>
        self.device = <span class="hljs-string">"cuda"</span> <span class="hljs-keyword">if</span> torch.cuda.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>
        logger.info(<span class="hljs-string">f"Using device: <span class="hljs-subst">{self.device}</span>"</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_model_and_tokenizer</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Load base model and tokenizer"""</span>
        logger.info(<span class="hljs-string">f"Loading model: <span class="hljs-subst">{self.config[<span class="hljs-string">'model_name'</span>]}</span>"</span>)

        <span class="hljs-comment"># Configure 4-bit quantization to save memory</span>
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=<span class="hljs-literal">True</span>,
            bnb_4bit_quant_type=<span class="hljs-string">"nf4"</span>,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=<span class="hljs-literal">True</span>
        )

        <span class="hljs-comment"># Load model with quantization</span>
        self.model = AutoModelForCausalLM.from_pretrained(
            self.config[<span class="hljs-string">"model_name"</span>],
            quantization_config=bnb_config,
            device_map=<span class="hljs-string">"auto"</span>,
            trust_remote_code=<span class="hljs-literal">True</span>,
            use_auth_token=<span class="hljs-literal">True</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"llama"</span> <span class="hljs-keyword">in</span> self.config[<span class="hljs-string">"model_name"</span>].lower() <span class="hljs-keyword">else</span> <span class="hljs-literal">False</span>
        )

        <span class="hljs-comment"># Load tokenizer</span>
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.config[<span class="hljs-string">"model_name"</span>],
            trust_remote_code=<span class="hljs-literal">True</span>,
            use_auth_token=<span class="hljs-literal">True</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"llama"</span> <span class="hljs-keyword">in</span> self.config[<span class="hljs-string">"model_name"</span>].lower() <span class="hljs-keyword">else</span> <span class="hljs-literal">False</span>
        )

        <span class="hljs-comment"># Set padding token</span>
        <span class="hljs-keyword">if</span> self.tokenizer.pad_token <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        logger.info(<span class="hljs-string">f"Model loaded: <span class="hljs-subst">{self.config[<span class="hljs-string">'model_name'</span>]}</span>"</span>)
        logger.info(<span class="hljs-string">f"Tokenizer vocab size: <span class="hljs-subst">{len(self.tokenizer)}</span>"</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">prepare_model_for_training</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Apply LoRA configuration to model"""</span>
        logger.info(<span class="hljs-string">"Preparing model for LoRA training..."</span>)

        <span class="hljs-comment"># Prepare model for k-bit training</span>
        self.model = prepare_model_for_kbit_training(self.model)

        <span class="hljs-comment"># Configure LoRA</span>
        lora_config = LoraConfig(
            r=self.config[<span class="hljs-string">"lora_r"</span>],
            lora_alpha=self.config[<span class="hljs-string">"lora_alpha"</span>],
            target_modules=self.config[<span class="hljs-string">"lora_target_modules"</span>],
            lora_dropout=self.config[<span class="hljs-string">"lora_dropout"</span>],
            bias=<span class="hljs-string">"none"</span>,
            task_type=<span class="hljs-string">"CAUSAL_LM"</span>
        )

        <span class="hljs-comment"># Apply LoRA</span>
        self.model = get_peft_model(self.model, lora_config)

        <span class="hljs-comment"># Print trainable parameters</span>
        self.model.print_trainable_parameters()

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_and_tokenize_data</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Load and tokenize training data"""</span>
        logger.info(<span class="hljs-string">"Loading training data..."</span>)

        <span class="hljs-comment"># Get data paths from environment (SageMaker sets these)</span>
        train_data_path = os.environ.get(<span class="hljs-string">'SM_CHANNEL_TRAIN'</span>, <span class="hljs-string">'data/train'</span>)
        val_data_path = os.environ.get(<span class="hljs-string">'SM_CHANNEL_VALIDATION'</span>, <span class="hljs-string">'data/validation'</span>)

        logger.info(<span class="hljs-string">f"Train data path: <span class="hljs-subst">{train_data_path}</span>"</span>)
        logger.info(<span class="hljs-string">f"Validation data path: <span class="hljs-subst">{val_data_path}</span>"</span>)

        <span class="hljs-comment"># Load datasets</span>
        train_files = [str(f) <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> Path(train_data_path).glob(<span class="hljs-string">"*.json"</span>)]
        val_files = [str(f) <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> Path(val_data_path).glob(<span class="hljs-string">"*.json"</span>)]

        train_dataset = load_dataset(<span class="hljs-string">'json'</span>, data_files=train_files)
        val_dataset = load_dataset(<span class="hljs-string">'json'</span>, data_files=val_files) <span class="hljs-keyword">if</span> val_files <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

        <span class="hljs-comment"># Tokenization function</span>
        <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">tokenize_function</span>(<span class="hljs-params">examples</span>):</span>
            <span class="hljs-keyword">return</span> self.tokenizer(
                examples[<span class="hljs-string">"text"</span>],
                truncation=<span class="hljs-literal">True</span>,
                padding=<span class="hljs-string">"max_length"</span>,
                max_length=self.config[<span class="hljs-string">"max_length"</span>]
            )

        <span class="hljs-comment"># Tokenize datasets</span>
        tokenized_train = train_dataset.map(
            tokenize_function,
            batched=<span class="hljs-literal">True</span>,
            remove_columns=train_dataset[<span class="hljs-string">"train"</span>].column_names
        )

        <span class="hljs-keyword">if</span> val_dataset:
            tokenized_val = val_dataset.map(
                tokenize_function,
                batched=<span class="hljs-literal">True</span>,
                remove_columns=val_dataset[<span class="hljs-string">"train"</span>].column_names
            )
        <span class="hljs-keyword">else</span>:
            tokenized_val = <span class="hljs-literal">None</span>

        logger.info(<span class="hljs-string">f"Training samples: <span class="hljs-subst">{len(tokenized_train[<span class="hljs-string">'train'</span>])}</span>"</span>)
        <span class="hljs-keyword">if</span> tokenized_val:
            logger.info(<span class="hljs-string">f"Validation samples: <span class="hljs-subst">{len(tokenized_val[<span class="hljs-string">'train'</span>])}</span>"</span>)

        <span class="hljs-keyword">return</span> tokenized_train[<span class="hljs-string">"train"</span>], tokenized_val[<span class="hljs-string">"train"</span>] <span class="hljs-keyword">if</span> tokenized_val <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Main training loop"""</span>
        logger.info(<span class="hljs-string">"Starting training process..."</span>)

        <span class="hljs-comment"># Load model and tokenizer</span>
        self.load_model_and_tokenizer()

        <span class="hljs-comment"># Prepare for LoRA training</span>
        self.prepare_model_for_training()

        <span class="hljs-comment"># Load and tokenize data</span>
        train_dataset, val_dataset = self.load_and_tokenize_data()

        <span class="hljs-comment"># Create data collator</span>
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=<span class="hljs-literal">False</span>
        )

        <span class="hljs-comment"># Set output directory</span>
        output_dir = <span class="hljs-string">"/opt/ml/model"</span>  <span class="hljs-comment"># SageMaker expects this</span>

        <span class="hljs-comment"># Configure training arguments</span>
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=self.config[<span class="hljs-string">"num_epochs"</span>],
            per_device_train_batch_size=self.config[<span class="hljs-string">"batch_size"</span>],
            per_device_eval_batch_size=self.config[<span class="hljs-string">"batch_size"</span>],
            gradient_accumulation_steps=self.config[<span class="hljs-string">"gradient_accumulation_steps"</span>],
            warmup_steps=self.config[<span class="hljs-string">"warmup_steps"</span>],
            logging_steps=self.config[<span class="hljs-string">"logging_steps"</span>],
            save_steps=self.config[<span class="hljs-string">"save_steps"</span>],
            eval_steps=self.config[<span class="hljs-string">"eval_steps"</span>] <span class="hljs-keyword">if</span> val_dataset <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
            evaluation_strategy=<span class="hljs-string">"steps"</span> <span class="hljs-keyword">if</span> val_dataset <span class="hljs-keyword">else</span> <span class="hljs-string">"no"</span>,
            save_strategy=<span class="hljs-string">"steps"</span>,
            save_total_limit=<span class="hljs-number">2</span>,
            load_best_model_at_end=<span class="hljs-literal">True</span> <span class="hljs-keyword">if</span> val_dataset <span class="hljs-keyword">else</span> <span class="hljs-literal">False</span>,
            metric_for_best_model=<span class="hljs-string">"eval_loss"</span> <span class="hljs-keyword">if</span> val_dataset <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
            greater_is_better=<span class="hljs-literal">False</span> <span class="hljs-keyword">if</span> val_dataset <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
            learning_rate=self.config[<span class="hljs-string">"learning_rate"</span>],
            weight_decay=self.config[<span class="hljs-string">"weight_decay"</span>],
            fp16=<span class="hljs-literal">False</span>,
            bf16=self.config.get(<span class="hljs-string">"bf16"</span>, <span class="hljs-literal">False</span>),
            gradient_checkpointing=self.config[<span class="hljs-string">"gradient_checkpointing"</span>],
            optim=self.config[<span class="hljs-string">"optimizer"</span>],
            report_to=[<span class="hljs-string">"tensorboard"</span>],
            ddp_find_unused_parameters=<span class="hljs-literal">False</span>,
            remove_unused_columns=<span class="hljs-literal">False</span>
        )

        <span class="hljs-comment"># Initialize Trainer</span>
        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator,
        )

        <span class="hljs-comment"># Start training</span>
        logger.info(<span class="hljs-string">"Training started..."</span>)
        train_result = trainer.train()

        <span class="hljs-comment"># Save model</span>
        trainer.save_model()
        self.tokenizer.save_pretrained(output_dir)

        <span class="hljs-comment"># Save training metrics</span>
        metrics = train_result.metrics
        trainer.log_metrics(<span class="hljs-string">"train"</span>, metrics)
        trainer.save_metrics(<span class="hljs-string">"train"</span>, metrics)

        <span class="hljs-keyword">if</span> val_dataset:
            eval_metrics = trainer.evaluate()
            trainer.log_metrics(<span class="hljs-string">"eval"</span>, eval_metrics)
            trainer.save_metrics(<span class="hljs-string">"eval"</span>, eval_metrics)

        logger.info(<span class="hljs-string">f"Training completed! Model saved to <span class="hljs-subst">{output_dir}</span>"</span>)

        <span class="hljs-keyword">return</span> metrics

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-string">"""Main entry point"""</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># Check if running in SageMaker</span>
        sm_training_env = os.environ.get(<span class="hljs-string">'SM_TRAINING_ENV'</span>, <span class="hljs-string">''</span>)
        <span class="hljs-keyword">if</span> sm_training_env:
            logger.info(<span class="hljs-string">f"Running in SageMaker environment: <span class="hljs-subst">{sm_training_env}</span>"</span>)

        <span class="hljs-comment"># Initialize and run trainer</span>
        trainer = LLMTrainer()
        metrics = trainer.train()

        logger.info(<span class="hljs-string">"Training completed successfully!"</span>)
        logger.info(<span class="hljs-string">f"Final metrics: <span class="hljs-subst">{metrics}</span>"</span>)

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        logger.error(<span class="hljs-string">f"Training failed with error: <span class="hljs-subst">{str(e)}</span>"</span>)
        <span class="hljs-keyword">raise</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>Create <code>configs/training_config.json</code>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"model_name"</span>: <span class="hljs-string">"mistralai/Mistral-7B-Instruct-v0.1"</span>,
  <span class="hljs-attr">"num_epochs"</span>: <span class="hljs-number">3</span>,
  <span class="hljs-attr">"batch_size"</span>: <span class="hljs-number">2</span>,
  <span class="hljs-attr">"gradient_accumulation_steps"</span>: <span class="hljs-number">4</span>,
  <span class="hljs-attr">"learning_rate"</span>: <span class="hljs-number">2e-4</span>,
  <span class="hljs-attr">"weight_decay"</span>: <span class="hljs-number">0.01</span>,
  <span class="hljs-attr">"warmup_steps"</span>: <span class="hljs-number">100</span>,
  <span class="hljs-attr">"logging_steps"</span>: <span class="hljs-number">50</span>,
  <span class="hljs-attr">"save_steps"</span>: <span class="hljs-number">100</span>,
  <span class="hljs-attr">"eval_steps"</span>: <span class="hljs-number">100</span>,
  <span class="hljs-attr">"max_length"</span>: <span class="hljs-number">512</span>,
  <span class="hljs-attr">"lora_r"</span>: <span class="hljs-number">16</span>,
  <span class="hljs-attr">"lora_alpha"</span>: <span class="hljs-number">32</span>,
  <span class="hljs-attr">"lora_dropout"</span>: <span class="hljs-number">0.1</span>,
  <span class="hljs-attr">"lora_target_modules"</span>: [<span class="hljs-string">"q_proj"</span>, <span class="hljs-string">"k_proj"</span>, <span class="hljs-string">"v_proj"</span>, <span class="hljs-string">"o_proj"</span>],
  <span class="hljs-attr">"gradient_checkpointing"</span>: <span class="hljs-literal">true</span>,
  <span class="hljs-attr">"bf16"</span>: <span class="hljs-literal">true</span>,
  <span class="hljs-attr">"optimizer"</span>: <span class="hljs-string">"adamw_8bit"</span>
}
</code></pre>
<p>Create <code>scripts/requirements.txt</code>:</p>
<pre><code class="lang-plaintext">transformers==4.36.0
datasets==2.14.0
accelerate==0.25.0
peft==0.7.0
bitsandbytes==0.41.3
torch==2.1.0
scikit-learn
sentencepiece
protobuf
einops
</code></pre>
<h3 id="heading-step-7-create-sagemaker-entry-point-script"><strong>Step 7: Create SageMaker Entry Point Script</strong></h3>
<p>Create <code>scripts/sagemaker_</code><a target="_blank" href="http://entry.py"><code>entry.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># scripts/sagemaker_entry.py</span>

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> subprocess
<span class="hljs-keyword">import</span> argparse

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">install_requirements</span>():</span>
    <span class="hljs-string">"""Install required packages"""</span>
    print(<span class="hljs-string">"Installing requirements..."</span>)
    subprocess.check_call([
        sys.executable, <span class="hljs-string">"-m"</span>, <span class="hljs-string">"pip"</span>, <span class="hljs-string">"install"</span>,
        <span class="hljs-string">"-r"</span>, <span class="hljs-string">"/opt/ml/code/requirements.txt"</span>
    ])

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    parser = argparse.ArgumentParser()
    parser.add_argument(
        <span class="hljs-string">"--train"</span>, 
        action=<span class="hljs-string">"store_true"</span>,
        help=<span class="hljs-string">"Run training"</span>
    )
    parser.add_argument(
        <span class="hljs-string">"--serve"</span>, 
        action=<span class="hljs-string">"store_true"</span>,
        help=<span class="hljs-string">"Run serving"</span>
    )

    args = parser.parse_args()

    <span class="hljs-keyword">if</span> args.train:
        <span class="hljs-comment"># Install dependencies first</span>
        install_requirements()

        <span class="hljs-comment"># Run training</span>
        print(<span class="hljs-string">"Starting training..."</span>)
        <span class="hljs-keyword">from</span> train <span class="hljs-keyword">import</span> main <span class="hljs-keyword">as</span> train_main
        train_main()

    <span class="hljs-keyword">elif</span> args.serve:
        print(<span class="hljs-string">"Serving mode - this would load the model for inference"</span>)
        <span class="hljs-comment"># For SageMaker deployment</span>
        <span class="hljs-keyword">pass</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<h2 id="heading-phase-3-launch-training-15-minutes"><strong>Phase 3: Launch Training (15 minutes)</strong></h2>
<h3 id="heading-step-8-create-launch-script"><strong>Step 8: Create Launch Script</strong></h3>
<p>Create <code>launch_</code><a target="_blank" href="http://training.py"><code>training.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># launch_training.py</span>

<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">from</span> sagemaker.huggingface <span class="hljs-keyword">import</span> HuggingFace, get_huggingface_llm_image_uri

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_training_job</span>():</span>
    <span class="hljs-string">"""Create and launch SageMaker training job"""</span>

    <span class="hljs-comment"># Configuration</span>
    config = {
        <span class="hljs-string">"job_name"</span>: <span class="hljs-string">f"llama-finetune-<span class="hljs-subst">{datetime.now().strftime(<span class="hljs-string">'%Y%m%d-%H%M%S'</span>)}</span>"</span>,
        <span class="hljs-string">"instance_type"</span>: <span class="hljs-string">"ml.g5.2xlarge"</span>,  <span class="hljs-comment"># Cheapest GPU with enough memory</span>
        <span class="hljs-string">"instance_count"</span>: <span class="hljs-number">1</span>,
        <span class="hljs-string">"volume_size"</span>: <span class="hljs-number">200</span>,  <span class="hljs-comment"># GB</span>
        <span class="hljs-string">"max_run_hours"</span>: <span class="hljs-number">4</span>,
        <span class="hljs-string">"use_spot_instances"</span>: <span class="hljs-literal">True</span>,
        <span class="hljs-string">"max_wait_hours"</span>: <span class="hljs-number">8</span>,
        <span class="hljs-string">"bucket_name"</span>: <span class="hljs-string">"llama3-finetune-1234567890"</span>,  <span class="hljs-comment"># Your bucket from earlier</span>
        <span class="hljs-string">"role_arn"</span>: <span class="hljs-literal">None</span>,  <span class="hljs-comment"># Will get from SageMaker</span>
    }

    <span class="hljs-comment"># Initialize session</span>
    session = boto3.Session()
    sagemaker_session = boto3.Session().client(<span class="hljs-string">'sagemaker'</span>)

    <span class="hljs-comment"># Get SageMaker execution role</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> config[<span class="hljs-string">"role_arn"</span>]:
        <span class="hljs-comment"># Try to get default role</span>
        <span class="hljs-keyword">try</span>:
            iam = boto3.client(<span class="hljs-string">'iam'</span>)
            roles = iam.list_roles(PathPrefix=<span class="hljs-string">'/service-role/'</span>)
            <span class="hljs-keyword">for</span> role <span class="hljs-keyword">in</span> roles[<span class="hljs-string">'Roles'</span>]:
                <span class="hljs-keyword">if</span> <span class="hljs-string">'AmazonSageMaker-ExecutionRole'</span> <span class="hljs-keyword">in</span> role[<span class="hljs-string">'RoleName'</span>]:
                    config[<span class="hljs-string">"role_arn"</span>] = role[<span class="hljs-string">'Arn'</span>]
                    <span class="hljs-keyword">break</span>
        <span class="hljs-keyword">except</span>:
            <span class="hljs-keyword">pass</span>

        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> config[<span class="hljs-string">"role_arn"</span>]:
            print(<span class="hljs-string">"No SageMaker role found. Creating one..."</span>)
            <span class="hljs-comment"># You'll need to create this through AWS Console first</span>
            print(<span class="hljs-string">"Please create a SageMaker execution role:"</span>)
            print(<span class="hljs-string">"1. Go to IAM Console"</span>)
            print(<span class="hljs-string">"2. Create role"</span>)
            print(<span class="hljs-string">"3. Select 'SageMaker' as use case"</span>)
            print(<span class="hljs-string">"4. Attach policies: AmazonSageMakerFullAccess, AmazonS3FullAccess"</span>)
            print(<span class="hljs-string">"5. Name: AmazonSageMaker-ExecutionRole"</span>)
            print(<span class="hljs-string">"6. Copy the ARN and paste it below"</span>)
            config[<span class="hljs-string">"role_arn"</span>] = input(<span class="hljs-string">"Enter SageMaker Execution Role ARN: "</span>)

    <span class="hljs-comment"># Create HuggingFace estimator</span>
    print(<span class="hljs-string">f"Creating training job: <span class="hljs-subst">{config[<span class="hljs-string">'job_name'</span>]}</span>"</span>)

    <span class="hljs-comment"># Hyperparameters</span>
    hyperparameters = {
        <span class="hljs-string">"model_name"</span>: <span class="hljs-string">"mistralai/Mistral-7B-Instruct-v0.1"</span>,
        <span class="hljs-string">"num_epochs"</span>: <span class="hljs-string">"3"</span>,
        <span class="hljs-string">"batch_size"</span>: <span class="hljs-string">"2"</span>,
        <span class="hljs-string">"learning_rate"</span>: <span class="hljs-string">"2e-4"</span>,
        <span class="hljs-string">"lora_r"</span>: <span class="hljs-string">"16"</span>,
    }

    <span class="hljs-comment"># Environment variables</span>
    environment = {
        <span class="hljs-string">"HF_TOKEN"</span>: os.environ.get(<span class="hljs-string">"HF_TOKEN"</span>, <span class="hljs-string">""</span>),  <span class="hljs-comment"># For Llama 3 access</span>
        <span class="hljs-string">"MODEL_CACHE"</span>: <span class="hljs-string">"/opt/ml/model"</span>,
    }

    <span class="hljs-comment"># Create estimator</span>
    estimator = HuggingFace(
        entry_point=<span class="hljs-string">"sagemaker_entry.py"</span>,
        source_dir=<span class="hljs-string">"scripts"</span>,
        instance_type=config[<span class="hljs-string">"instance_type"</span>],
        instance_count=config[<span class="hljs-string">"instance_count"</span>],
        volume_size=config[<span class="hljs-string">"volume_size"</span>],
        role=config[<span class="hljs-string">"role_arn"</span>],
        transformers_version=<span class="hljs-string">"4.36.0"</span>,
        pytorch_version=<span class="hljs-string">"2.1.0"</span>,
        py_version=<span class="hljs-string">"py310"</span>,
        hyperparameters=hyperparameters,
        environment=environment,
        max_run=config[<span class="hljs-string">"max_run_hours"</span>] * <span class="hljs-number">3600</span>,
        use_spot_instances=config[<span class="hljs-string">"use_spot_instances"</span>],
        max_wait=config[<span class="hljs-string">"max_wait_hours"</span>] * <span class="hljs-number">3600</span> <span class="hljs-keyword">if</span> config[<span class="hljs-string">"use_spot_instances"</span>] <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>,
        output_path=<span class="hljs-string">f"s3://<span class="hljs-subst">{config[<span class="hljs-string">'bucket_name'</span>]}</span>/outputs/"</span>,
        code_location=<span class="hljs-string">f"s3://<span class="hljs-subst">{config[<span class="hljs-string">'bucket_name'</span>]}</span>/code/"</span>,
        disable_profiler=<span class="hljs-literal">True</span>,
        debugger_hook_config=<span class="hljs-literal">False</span>,
    )

    <span class="hljs-comment"># Define input data configuration</span>
    inputs = {
        <span class="hljs-string">"train"</span>: <span class="hljs-string">f"s3://<span class="hljs-subst">{config[<span class="hljs-string">'bucket_name'</span>]}</span>/data/train/"</span>,
        <span class="hljs-string">"validation"</span>: <span class="hljs-string">f"s3://<span class="hljs-subst">{config[<span class="hljs-string">'bucket_name'</span>]}</span>/data/validation/"</span>,
    }

    <span class="hljs-comment"># Launch training job</span>
    print(<span class="hljs-string">"Launching training job..."</span>)
    estimator.fit(inputs, job_name=config[<span class="hljs-string">"job_name"</span>], wait=<span class="hljs-literal">False</span>)

    <span class="hljs-comment"># Get job details</span>
    job_description = sagemaker_session.describe_training_job(
        TrainingJobName=config[<span class="hljs-string">"job_name"</span>]
    )

    print(<span class="hljs-string">f"\n✅ Training job launched successfully!"</span>)
    print(<span class="hljs-string">f"Job Name: <span class="hljs-subst">{config[<span class="hljs-string">'job_name'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"Job ARN: <span class="hljs-subst">{job_description[<span class="hljs-string">'TrainingJobArn'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"Instance: <span class="hljs-subst">{config[<span class="hljs-string">'instance_type'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"Spot Instances: <span class="hljs-subst">{config[<span class="hljs-string">'use_spot_instances'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"Estimated cost: $<span class="hljs-subst">{estimate_cost(config[<span class="hljs-string">'instance_type'</span>], config[<span class="hljs-string">'max_run_hours'</span>])}</span>"</span>)
    print(<span class="hljs-string">f"\nMonitor job at: https://<span class="hljs-subst">{session.region_name}</span>.console.aws.amazon.com/sagemaker/home?region=<span class="hljs-subst">{session.region_name}</span>#/training-jobs/<span class="hljs-subst">{config[<span class="hljs-string">'job_name'</span>]}</span>"</span>)

    <span class="hljs-keyword">return</span> config[<span class="hljs-string">"job_name"</span>]

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">estimate_cost</span>(<span class="hljs-params">instance_type, hours</span>):</span>
    <span class="hljs-string">"""Rough cost estimation"""</span>
    pricing = {
        <span class="hljs-string">"ml.g5.2xlarge"</span>: <span class="hljs-number">1.212</span>,  <span class="hljs-comment"># per hour</span>
        <span class="hljs-string">"ml.g5.4xlarge"</span>: <span class="hljs-number">2.176</span>,
        <span class="hljs-string">"ml.g5.8xlarge"</span>: <span class="hljs-number">4.352</span>,
        <span class="hljs-string">"ml.g5.12xlarge"</span>: <span class="hljs-number">6.528</span>,
    }

    base_cost = pricing.get(instance_type, <span class="hljs-number">1.5</span>) * hours
    spot_cost = base_cost * <span class="hljs-number">0.3</span>  <span class="hljs-comment"># ~70% discount for spot</span>

    <span class="hljs-keyword">return</span> round(spot_cost, <span class="hljs-number">2</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">monitor_job</span>(<span class="hljs-params">job_name</span>):</span>
    <span class="hljs-string">"""Monitor training job progress"""</span>
    client = boto3.client(<span class="hljs-string">'sagemaker'</span>)

    print(<span class="hljs-string">f"\nMonitoring job: <span class="hljs-subst">{job_name}</span>"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">50</span>)

    status = <span class="hljs-string">"InProgress"</span>
    <span class="hljs-keyword">while</span> status <span class="hljs-keyword">in</span> [<span class="hljs-string">"InProgress"</span>, <span class="hljs-string">"Starting"</span>]:
        <span class="hljs-keyword">try</span>:
            response = client.describe_training_job(TrainingJobName=job_name)
            status = response[<span class="hljs-string">'TrainingJobStatus'</span>]

            <span class="hljs-keyword">if</span> <span class="hljs-string">'TrainingStartTime'</span> <span class="hljs-keyword">in</span> response:
                elapsed = (time.time() - response[<span class="hljs-string">'TrainingStartTime'</span>].timestamp()) / <span class="hljs-number">60</span>
                print(<span class="hljs-string">f"Status: <span class="hljs-subst">{status}</span> | Elapsed: <span class="hljs-subst">{elapsed:<span class="hljs-number">.1</span>f}</span> min"</span>, end=<span class="hljs-string">'\r'</span>)

            <span class="hljs-keyword">if</span> <span class="hljs-string">'FinalMetricDataList'</span> <span class="hljs-keyword">in</span> response:
                <span class="hljs-keyword">for</span> metric <span class="hljs-keyword">in</span> response[<span class="hljs-string">'FinalMetricDataList'</span>]:
                    print(<span class="hljs-string">f"<span class="hljs-subst">{metric[<span class="hljs-string">'MetricName'</span>]}</span>: <span class="hljs-subst">{metric[<span class="hljs-string">'Value'</span>]}</span>"</span>)

            time.sleep(<span class="hljs-number">30</span>)

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">f"\nError monitoring: <span class="hljs-subst">{e}</span>"</span>)
            <span class="hljs-keyword">break</span>

    print(<span class="hljs-string">f"\nFinal Status: <span class="hljs-subst">{status}</span>"</span>)

    <span class="hljs-keyword">if</span> status == <span class="hljs-string">"Completed"</span>:
        print(<span class="hljs-string">"✅ Training completed successfully!"</span>)
        print(<span class="hljs-string">f"Model artifacts: <span class="hljs-subst">{response.get(<span class="hljs-string">'ModelArtifacts'</span>, {}</span>).get('S3ModelArtifacts', 'N/A')}"</span>)
    <span class="hljs-keyword">elif</span> status == <span class="hljs-string">"Failed"</span>:
        print(<span class="hljs-string">"❌ Training failed!"</span>)
        print(<span class="hljs-string">f"Failure reason: <span class="hljs-subst">{response.get(<span class="hljs-string">'FailureReason'</span>, <span class="hljs-string">'Unknown'</span>)}</span>"</span>)

    <span class="hljs-keyword">return</span> status

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-string">"""Main function"""</span>
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"Llama 3 Fine-Tuning on SageMaker - Launch Script"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    <span class="hljs-comment"># Step 1: Create training job</span>
    job_name = create_training_job()

    <span class="hljs-comment"># Step 2: Ask if user wants to monitor</span>
    monitor = input(<span class="hljs-string">"\nDo you want to monitor the job? (yes/no): "</span>).lower()
    <span class="hljs-keyword">if</span> monitor <span class="hljs-keyword">in</span> [<span class="hljs-string">'yes'</span>, <span class="hljs-string">'y'</span>]:
        monitor_job(job_name)

    <span class="hljs-comment"># Step 3: Show next steps</span>
    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"NEXT STEPS:"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"1. Wait for training to complete (2-4 hours)"</span>)
    print(<span class="hljs-string">"2. Check S3 for model artifacts:"</span>)
    print(<span class="hljs-string">f"   aws s3 ls s3://llama3-finetune-*/outputs/<span class="hljs-subst">{job_name}</span>/"</span>)
    print(<span class="hljs-string">"3. Deploy the model:"</span>)
    print(<span class="hljs-string">"   python deploy_model.py --job-name "</span> + job_name)
    print(<span class="hljs-string">"\nTo check status manually:"</span>)
    print(<span class="hljs-string">f"   aws sagemaker describe-training-job --training-job-name <span class="hljs-subst">{job_name}</span>"</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<h3 id="heading-step-9-run-the-training"><strong>Step 9: Run the Training!</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Make scripts executable</span>
chmod +x launch_training.py
chmod +x scripts/*.py

<span class="hljs-comment"># Run the launch script</span>
python launch_training.py

<span class="hljs-comment"># Or run directly with minimal setup</span>
python -c <span class="hljs-string">"
import boto3
from sagemaker.huggingface import HuggingFace

# Quick start - minimal configuration
estimator = HuggingFace(
    entry_point='train.py',
    source_dir='scripts',
    instance_type='ml.g5.2xlarge',
    instance_count=1,
    role='your-sagemaker-role-arn',  # Replace with your role
    transformers_version='4.36',
    pytorch_version='2.1',
    py_version='py310',
    hyperparameters={
        'model_name': 'mistralai/Mistral-7B-Instruct-v0.1',
        'num_epochs': 1,  # Start with 1 epoch for testing
    }
)

# Start training
estimator.fit({
    'train': 's3://your-bucket/data/train/',
    'validation': 's3://your-bucket/data/validation/'
}, wait=True)
"</span>
</code></pre>
<h2 id="heading-phase-4-monitor-amp-deploy-after-training-completes"><strong>Phase 4: Monitor &amp; Deploy (After Training Completes)</strong></h2>
<h3 id="heading-step-10-check-training-results"><strong>Step 10: Check Training Results</strong></h3>
<p>Create <code>check_</code><a target="_blank" href="http://results.py"><code>results.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># check_results.py</span>

<span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_training_job</span>(<span class="hljs-params">job_name</span>):</span>
    <span class="hljs-string">"""Check training job status and results"""</span>
    client = boto3.client(<span class="hljs-string">'sagemaker'</span>)

    <span class="hljs-keyword">try</span>:
        response = client.describe_training_job(TrainingJobName=job_name)

        print(<span class="hljs-string">f"Job Name: <span class="hljs-subst">{response[<span class="hljs-string">'TrainingJobName'</span>]}</span>"</span>)
        print(<span class="hljs-string">f"Status: <span class="hljs-subst">{response[<span class="hljs-string">'TrainingJobStatus'</span>]}</span>"</span>)
        print(<span class="hljs-string">f"Creation Time: <span class="hljs-subst">{response[<span class="hljs-string">'CreationTime'</span>]}</span>"</span>)

        <span class="hljs-keyword">if</span> <span class="hljs-string">'TrainingEndTime'</span> <span class="hljs-keyword">in</span> response:
            print(<span class="hljs-string">f"End Time: <span class="hljs-subst">{response[<span class="hljs-string">'TrainingEndTime'</span>]}</span>"</span>)
            duration = (response[<span class="hljs-string">'TrainingEndTime'</span>] - response[<span class="hljs-string">'TrainingStartTime'</span>]).total_seconds() / <span class="hljs-number">3600</span>
            print(<span class="hljs-string">f"Duration: <span class="hljs-subst">{duration:<span class="hljs-number">.2</span>f}</span> hours"</span>)

        <span class="hljs-keyword">if</span> <span class="hljs-string">'ModelArtifacts'</span> <span class="hljs-keyword">in</span> response:
            print(<span class="hljs-string">f"\nModel Artifacts: <span class="hljs-subst">{response[<span class="hljs-string">'ModelArtifacts'</span>][<span class="hljs-string">'S3ModelArtifacts'</span>]}</span>"</span>)

        <span class="hljs-keyword">if</span> <span class="hljs-string">'FinalMetricDataList'</span> <span class="hljs-keyword">in</span> response:
            print(<span class="hljs-string">"\nFinal Metrics:"</span>)
            <span class="hljs-keyword">for</span> metric <span class="hljs-keyword">in</span> response[<span class="hljs-string">'FinalMetricDataList'</span>]:
                print(<span class="hljs-string">f"  <span class="hljs-subst">{metric[<span class="hljs-string">'MetricName'</span>]}</span>: <span class="hljs-subst">{metric[<span class="hljs-string">'Value'</span>]:<span class="hljs-number">.4</span>f}</span>"</span>)

        <span class="hljs-comment"># Check for Spot training savings</span>
        <span class="hljs-keyword">if</span> response.get(<span class="hljs-string">'EnableManagedSpotTraining'</span>, <span class="hljs-literal">False</span>):
            billable_time = response.get(<span class="hljs-string">'BillableTimeInSeconds'</span>, <span class="hljs-number">0</span>)
            total_time = response.get(<span class="hljs-string">'TrainingTimeInSeconds'</span>, <span class="hljs-number">0</span>)
            <span class="hljs-keyword">if</span> total_time &gt; <span class="hljs-number">0</span>:
                savings = (<span class="hljs-number">1</span> - (billable_time / total_time)) * <span class="hljs-number">100</span>
                print(<span class="hljs-string">f"\nSpot Training Savings: <span class="hljs-subst">{savings:<span class="hljs-number">.1</span>f}</span>%"</span>)
                print(<span class="hljs-string">f"Billable time: <span class="hljs-subst">{billable_time/<span class="hljs-number">3600</span>:<span class="hljs-number">.1</span>f}</span>h"</span>)
                print(<span class="hljs-string">f"Total time: <span class="hljs-subst">{total_time/<span class="hljs-number">3600</span>:<span class="hljs-number">.1</span>f}</span>h"</span>)

        <span class="hljs-comment"># Estimate cost</span>
        instance_type = response[<span class="hljs-string">'ResourceConfig'</span>][<span class="hljs-string">'InstanceType'</span>]
        duration_hours = response.get(<span class="hljs-string">'TrainingTimeInSeconds'</span>, <span class="hljs-number">0</span>) / <span class="hljs-number">3600</span>

        <span class="hljs-comment"># Rough pricing (varies by region)</span>
        pricing = {
            <span class="hljs-string">'ml.g5.2xlarge'</span>: <span class="hljs-number">1.212</span>,
            <span class="hljs-string">'ml.g5.4xlarge'</span>: <span class="hljs-number">2.176</span>,
            <span class="hljs-string">'ml.g5.8xlarge'</span>: <span class="hljs-number">4.352</span>,
        }

        hourly_rate = pricing.get(instance_type, <span class="hljs-number">1.5</span>)
        cost = hourly_rate * duration_hours

        <span class="hljs-keyword">if</span> response.get(<span class="hljs-string">'EnableManagedSpotTraining'</span>, <span class="hljs-literal">False</span>):
            cost *= <span class="hljs-number">0.3</span>  <span class="hljs-comment"># ~70% discount</span>

        print(<span class="hljs-string">f"\nEstimated Cost: $<span class="hljs-subst">{cost:<span class="hljs-number">.2</span>f}</span>"</span>)

        <span class="hljs-keyword">return</span> response

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Error: <span class="hljs-subst">{e}</span>"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">download_model</span>(<span class="hljs-params">job_name, local_dir=<span class="hljs-string">"model_output"</span></span>):</span>
    <span class="hljs-string">"""Download trained model from S3"""</span>
    <span class="hljs-keyword">import</span> os
    <span class="hljs-keyword">from</span> urllib.parse <span class="hljs-keyword">import</span> urlparse
    <span class="hljs-keyword">import</span> tarfile

    <span class="hljs-comment"># Get model artifacts location</span>
    client = boto3.client(<span class="hljs-string">'sagemaker'</span>)
    response = client.describe_training_job(TrainingJobName=job_name)

    <span class="hljs-keyword">if</span> <span class="hljs-string">'ModelArtifacts'</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> response:
        print(<span class="hljs-string">"No model artifacts found"</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-literal">None</span>

    s3_path = response[<span class="hljs-string">'ModelArtifacts'</span>][<span class="hljs-string">'S3ModelArtifacts'</span>]

    <span class="hljs-comment"># Parse S3 URL</span>
    parsed = urlparse(s3_path)
    bucket = parsed.netloc
    key = parsed.path.lstrip(<span class="hljs-string">'/'</span>)

    <span class="hljs-comment"># Create local directory</span>
    os.makedirs(local_dir, exist_ok=<span class="hljs-literal">True</span>)

    <span class="hljs-comment"># Download file</span>
    local_file = os.path.join(local_dir, <span class="hljs-string">'model.tar.gz'</span>)

    print(<span class="hljs-string">f"Downloading model from s3://<span class="hljs-subst">{bucket}</span>/<span class="hljs-subst">{key}</span>"</span>)
    print(<span class="hljs-string">f"To: <span class="hljs-subst">{local_file}</span>"</span>)

    s3 = boto3.client(<span class="hljs-string">'s3'</span>)
    s3.download_file(bucket, key, local_file)

    <span class="hljs-comment"># Extract if it's a tar file</span>
    <span class="hljs-keyword">if</span> local_file.endswith(<span class="hljs-string">'.tar.gz'</span>):
        print(<span class="hljs-string">"Extracting model..."</span>)
        <span class="hljs-keyword">with</span> tarfile.open(local_file, <span class="hljs-string">'r:gz'</span>) <span class="hljs-keyword">as</span> tar:
            tar.extractall(path=local_dir)

        <span class="hljs-comment"># Remove tar file</span>
        os.remove(local_file)

    print(<span class="hljs-string">f"Model downloaded to: <span class="hljs-subst">{local_dir}</span>"</span>)

    <span class="hljs-comment"># List contents</span>
    print(<span class="hljs-string">"\nModel contents:"</span>)
    <span class="hljs-keyword">for</span> root, dirs, files <span class="hljs-keyword">in</span> os.walk(local_dir):
        <span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files[:<span class="hljs-number">10</span>]:  <span class="hljs-comment"># Show first 10 files</span>
            print(<span class="hljs-string">f"  <span class="hljs-subst">{os.path.join(root, file)}</span>"</span>)

    <span class="hljs-keyword">return</span> local_dir

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    <span class="hljs-keyword">import</span> sys

    <span class="hljs-keyword">if</span> len(sys.argv) &gt; <span class="hljs-number">1</span>:
        job_name = sys.argv[<span class="hljs-number">1</span>]
    <span class="hljs-keyword">else</span>:
        job_name = input(<span class="hljs-string">"Enter training job name: "</span>)

    print(<span class="hljs-string">f"Checking job: <span class="hljs-subst">{job_name}</span>"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    result = check_training_job(job_name)

    <span class="hljs-keyword">if</span> result <span class="hljs-keyword">and</span> result[<span class="hljs-string">'TrainingJobStatus'</span>] == <span class="hljs-string">'Completed'</span>:
        download = input(<span class="hljs-string">"\nDownload model? (yes/no): "</span>).lower()
        <span class="hljs-keyword">if</span> download <span class="hljs-keyword">in</span> [<span class="hljs-string">'yes'</span>, <span class="hljs-string">'y'</span>]:
            download_model(job_name)
</code></pre>
<p>Run it:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># After training completes</span>
python check_results.py your-job-name-here
</code></pre>
<h3 id="heading-step-11-deploy-the-model"><strong>Step 11: Deploy the Model</strong></h3>
<p>Create <code>deploy_</code><a target="_blank" href="http://model.py"><code>model.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># deploy_model.py</span>

<span class="hljs-keyword">import</span> boto3
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> sagemaker.huggingface <span class="hljs-keyword">import</span> HuggingFaceModel
<span class="hljs-keyword">from</span> sagemaker <span class="hljs-keyword">import</span> Session

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">deploy_finetuned_model</span>(<span class="hljs-params">job_name, endpoint_name=None</span>):</span>
    <span class="hljs-string">"""Deploy the fine-tuned model to a SageMaker endpoint"""</span>

    <span class="hljs-comment"># Initialize</span>
    session = Session()
    region = session.boto_region_name

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> endpoint_name:
        endpoint_name = <span class="hljs-string">f"ft-<span class="hljs-subst">{job_name[:<span class="hljs-number">30</span>]}</span>"</span>  <span class="hljs-comment"># Limit to 30 chars</span>

    print(<span class="hljs-string">f"Deploying model from job: <span class="hljs-subst">{job_name}</span>"</span>)
    print(<span class="hljs-string">f"Endpoint name: <span class="hljs-subst">{endpoint_name}</span>"</span>)
    print(<span class="hljs-string">f"Region: <span class="hljs-subst">{region}</span>"</span>)

    <span class="hljs-comment"># Get model artifacts location</span>
    sm_client = boto3.client(<span class="hljs-string">'sagemaker'</span>, region_name=region)

    <span class="hljs-keyword">try</span>:
        job_info = sm_client.describe_training_job(TrainingJobName=job_name)
        model_s3_path = job_info[<span class="hljs-string">'ModelArtifacts'</span>][<span class="hljs-string">'S3ModelArtifacts'</span>]

        print(<span class="hljs-string">f"Model artifacts: <span class="hljs-subst">{model_s3_path}</span>"</span>)

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Error getting job info: <span class="hljs-subst">{e}</span>"</span>)
        print(<span class="hljs-string">"Trying to find model in S3..."</span>)

        <span class="hljs-comment"># Try to find model in S3</span>
        s3_client = boto3.client(<span class="hljs-string">'s3'</span>)

        <span class="hljs-comment"># Look for output directory</span>
        bucket = <span class="hljs-string">f"llama3-finetune-<span class="hljs-subst">{job_name.split(<span class="hljs-string">'-'</span>)[<span class="hljs-number">-1</span>]}</span>"</span>
        prefix = <span class="hljs-string">f"outputs/<span class="hljs-subst">{job_name}</span>/"</span>

        <span class="hljs-keyword">try</span>:
            response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
            <span class="hljs-keyword">if</span> <span class="hljs-string">'Contents'</span> <span class="hljs-keyword">in</span> response:
                <span class="hljs-keyword">for</span> obj <span class="hljs-keyword">in</span> response[<span class="hljs-string">'Contents'</span>]:
                    <span class="hljs-keyword">if</span> obj[<span class="hljs-string">'Key'</span>].endswith(<span class="hljs-string">'output/model.tar.gz'</span>):
                        model_s3_path = <span class="hljs-string">f"s3://<span class="hljs-subst">{bucket}</span>/<span class="hljs-subst">{obj[<span class="hljs-string">'Key'</span>]}</span>"</span>
                        <span class="hljs-keyword">break</span>
        <span class="hljs-keyword">except</span>:
            model_s3_path = input(<span class="hljs-string">"Enter full S3 path to model.tar.gz: "</span>)

    <span class="hljs-comment"># Create HuggingFace model</span>
    print(<span class="hljs-string">"\nCreating model object..."</span>)

    huggingface_model = HuggingFaceModel(
        model_data=model_s3_path,
        role=<span class="hljs-string">'your-sagemaker-role-arn'</span>,  <span class="hljs-comment"># Replace with your role</span>
        transformers_version=<span class="hljs-string">'4.36.0'</span>,
        pytorch_version=<span class="hljs-string">'2.1.0'</span>,
        py_version=<span class="hljs-string">'py310'</span>,
        env={
            <span class="hljs-string">'HF_MODEL_ID'</span>: <span class="hljs-string">'mistralai/Mistral-7B-Instruct-v0.1'</span>,
            <span class="hljs-string">'SM_NUM_GPUS'</span>: <span class="hljs-string">'1'</span>,
            <span class="hljs-string">'MAX_INPUT_LENGTH'</span>: <span class="hljs-string">'512'</span>,
            <span class="hljs-string">'MAX_TOTAL_TOKENS'</span>: <span class="hljs-string">'1024'</span>,
        }
    )

    <span class="hljs-comment"># Deploy to endpoint</span>
    print(<span class="hljs-string">"Deploying endpoint (this will take 5-10 minutes)..."</span>)

    predictor = huggingface_model.deploy(
        initial_instance_count=<span class="hljs-number">1</span>,
        instance_type=<span class="hljs-string">'ml.g5.xlarge'</span>,  <span class="hljs-comment"># Smaller than training instance</span>
        endpoint_name=endpoint_name,
        wait=<span class="hljs-literal">True</span>
    )

    print(<span class="hljs-string">f"\n✅ Endpoint deployed successfully!"</span>)
    print(<span class="hljs-string">f"Endpoint name: <span class="hljs-subst">{endpoint_name}</span>"</span>)
    print(<span class="hljs-string">f"Instance type: ml.g5.xlarge"</span>)
    print(<span class="hljs-string">f"Endpoint ARN: <span class="hljs-subst">{predictor.endpoint}</span>"</span>)

    <span class="hljs-comment"># Test the endpoint</span>
    print(<span class="hljs-string">"\nTesting endpoint..."</span>)

    test_prompt = {
        <span class="hljs-string">"inputs"</span>: <span class="hljs-string">"### Instruction:\nHow do I reset my password?\n\n### Response:"</span>,
        <span class="hljs-string">"parameters"</span>: {
            <span class="hljs-string">"max_new_tokens"</span>: <span class="hljs-number">200</span>,
            <span class="hljs-string">"temperature"</span>: <span class="hljs-number">0.7</span>,
            <span class="hljs-string">"top_p"</span>: <span class="hljs-number">0.9</span>,
            <span class="hljs-string">"do_sample"</span>: <span class="hljs-literal">True</span>
        }
    }

    <span class="hljs-keyword">try</span>:
        response = predictor.predict(test_prompt)
        print(<span class="hljs-string">"Test response:"</span>)
        print(json.dumps(response, indent=<span class="hljs-number">2</span>)[:<span class="hljs-number">500</span>] + <span class="hljs-string">"..."</span>)

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Test failed: <span class="hljs-subst">{e}</span>"</span>)

    <span class="hljs-keyword">return</span> predictor

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_endpoint</span>(<span class="hljs-params">endpoint_name</span>):</span>
    <span class="hljs-string">"""Test an existing endpoint"""</span>
    <span class="hljs-keyword">import</span> boto3

    runtime = boto3.client(<span class="hljs-string">'runtime.sagemaker'</span>)

    prompt = {
        <span class="hljs-string">"inputs"</span>: <span class="hljs-string">"### Instruction:\nWhat's your refund policy?\n\n### Response:"</span>,
        <span class="hljs-string">"parameters"</span>: {
            <span class="hljs-string">"max_new_tokens"</span>: <span class="hljs-number">100</span>,
            <span class="hljs-string">"temperature"</span>: <span class="hljs-number">0.1</span>  <span class="hljs-comment"># Lower temperature for more focused responses</span>
        }
    }

    response = runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType=<span class="hljs-string">'application/json'</span>,
        Body=json.dumps(prompt)
    )

    result = json.loads(response[<span class="hljs-string">'Body'</span>].read().decode())
    print(<span class="hljs-string">"Response from endpoint:"</span>)
    print(result[<span class="hljs-number">0</span>][<span class="hljs-string">'generated_text'</span>])

    <span class="hljs-keyword">return</span> result

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">cleanup</span>(<span class="hljs-params">endpoint_name</span>):</span>
    <span class="hljs-string">"""Delete endpoint to stop charges"""</span>
    print(<span class="hljs-string">f"Deleting endpoint: <span class="hljs-subst">{endpoint_name}</span>"</span>)

    sm_client = boto3.client(<span class="hljs-string">'sagemaker'</span>)

    <span class="hljs-keyword">try</span>:
        sm_client.delete_endpoint(EndpointName=endpoint_name)
        print(<span class="hljs-string">f"Endpoint <span class="hljs-subst">{endpoint_name}</span> deleted"</span>)

        <span class="hljs-comment"># Also delete endpoint config</span>
        <span class="hljs-keyword">try</span>:
            endpoint_info = sm_client.describe_endpoint(EndpointName=endpoint_name)
            config_name = endpoint_info[<span class="hljs-string">'EndpointConfigName'</span>]
            sm_client.delete_endpoint_config(EndpointConfigName=config_name)
            print(<span class="hljs-string">f"Endpoint config <span class="hljs-subst">{config_name}</span> deleted"</span>)
        <span class="hljs-keyword">except</span>:
            <span class="hljs-keyword">pass</span>

    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        print(<span class="hljs-string">f"Error deleting endpoint: <span class="hljs-subst">{e}</span>"</span>)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    <span class="hljs-keyword">import</span> argparse

    parser = argparse.ArgumentParser(description=<span class="hljs-string">"Deploy fine-tuned model"</span>)
    parser.add_argument(<span class="hljs-string">"--job-name"</span>, required=<span class="hljs-literal">True</span>, help=<span class="hljs-string">"Training job name"</span>)
    parser.add_argument(<span class="hljs-string">"--endpoint-name"</span>, help=<span class="hljs-string">"Endpoint name (optional)"</span>)
    parser.add_argument(<span class="hljs-string">"--test"</span>, action=<span class="hljs-string">"store_true"</span>, help=<span class="hljs-string">"Test existing endpoint"</span>)
    parser.add_argument(<span class="hljs-string">"--cleanup"</span>, action=<span class="hljs-string">"store_true"</span>, help=<span class="hljs-string">"Delete endpoint"</span>)

    args = parser.parse_args()

    <span class="hljs-keyword">if</span> args.cleanup <span class="hljs-keyword">and</span> args.endpoint_name:
        cleanup(args.endpoint_name)

    <span class="hljs-keyword">elif</span> args.test <span class="hljs-keyword">and</span> args.endpoint_name:
        test_endpoint(args.endpoint_name)

    <span class="hljs-keyword">else</span>:
        deploy_finetuned_model(args.job_name, args.endpoint_name)
</code></pre>
<p>Run deployment:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Deploy the model</span>
python deploy_model.py --job-name your-training-job-name

<span class="hljs-comment"># Test the endpoint</span>
python deploy_model.py --<span class="hljs-built_in">test</span> --endpoint-name ft-your-job-name

<span class="hljs-comment"># Clean up (important to avoid charges!)</span>
python deploy_model.py --cleanup --endpoint-name ft-your-job-name
</code></pre>
<h2 id="heading-phase-5-production-considerations"><strong>Phase 5: Production Considerations</strong></h2>
<h3 id="heading-step-12-create-production-setup-script"><strong>Step 12: Create Production Setup Script</strong></h3>
<p>Create <code>production_</code><a target="_blank" href="http://setup.py"><code>setup.py</code></a>:</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python3</span>
<span class="hljs-comment"># production_setup.py</span>

<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_ci_cd_pipeline</span>():</span>
    <span class="hljs-string">"""Create CI/CD pipeline configuration"""</span>

    pipeline_config = {
        <span class="hljs-string">"name"</span>: <span class="hljs-string">"llama-finetune-pipeline"</span>,
        <span class="hljs-string">"stages"</span>: [
            {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"DataValidation"</span>,
                <span class="hljs-string">"script"</span>: <span class="hljs-string">"scripts/validate_data.py"</span>,
                <span class="hljs-string">"instance"</span>: <span class="hljs-string">"ml.m5.large"</span>,
                <span class="hljs-string">"timeout"</span>: <span class="hljs-number">1800</span>
            },
            {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"Training"</span>,
                <span class="hljs-string">"script"</span>: <span class="hljs-string">"scripts/train.py"</span>,
                <span class="hljs-string">"instance"</span>: <span class="hljs-string">"ml.g5.2xlarge"</span>,
                <span class="hljs-string">"use_spot"</span>: <span class="hljs-literal">True</span>,
                <span class="hljs-string">"hyperparameters"</span>: {
                    <span class="hljs-string">"model_name"</span>: <span class="hljs-string">"mistralai/Mistral-7B-Instruct-v0.1"</span>,
                    <span class="hljs-string">"num_epochs"</span>: <span class="hljs-number">3</span>,
                    <span class="hljs-string">"learning_rate"</span>: <span class="hljs-string">"2e-4"</span>
                }
            },
            {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"Evaluation"</span>,
                <span class="hljs-string">"script"</span>: <span class="hljs-string">"scripts/evaluate.py"</span>,
                <span class="hljs-string">"instance"</span>: <span class="hljs-string">"ml.g5.xlarge"</span>,
                <span class="hljs-string">"metrics"</span>: [<span class="hljs-string">"accuracy"</span>, <span class="hljs-string">"perplexity"</span>, <span class="hljs-string">"bleu"</span>]
            },
            {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"Deployment"</span>,
                <span class="hljs-string">"condition"</span>: <span class="hljs-string">"evaluation.accuracy &gt; 0.85"</span>,
                <span class="hljs-string">"instance"</span>: <span class="hljs-string">"ml.g5.xlarge"</span>,
                <span class="hljs-string">"auto_scale"</span>: {
                    <span class="hljs-string">"min_capacity"</span>: <span class="hljs-number">1</span>,
                    <span class="hljs-string">"max_capacity"</span>: <span class="hljs-number">5</span>
                }
            }
        ],
        <span class="hljs-string">"monitoring"</span>: {
            <span class="hljs-string">"cloudwatch_metrics"</span>: [
                <span class="hljs-string">"Invocations"</span>,
                <span class="hljs-string">"ModelLatency"</span>,
                <span class="hljs-string">"CPUUtilization"</span>,
                <span class="hljs-string">"MemoryUtilization"</span>
            ],
            <span class="hljs-string">"alarms"</span>: [
                {
                    <span class="hljs-string">"metric"</span>: <span class="hljs-string">"ModelLatency"</span>,
                    <span class="hljs-string">"threshold"</span>: <span class="hljs-number">1000</span>,  <span class="hljs-comment"># ms</span>
                    <span class="hljs-string">"periods"</span>: <span class="hljs-number">2</span>
                },
                {
                    <span class="hljs-string">"metric"</span>: <span class="hljs-string">"Invocations"</span>,
                    <span class="hljs-string">"threshold"</span>: <span class="hljs-number">1000</span>,  <span class="hljs-comment"># per minute</span>
                    <span class="hljs-string">"periods"</span>: <span class="hljs-number">5</span>
                }
            ]
        },
        <span class="hljs-string">"cost_tracking"</span>: {
            <span class="hljs-string">"daily_budget"</span>: <span class="hljs-number">50</span>,
            <span class="hljs-string">"alarm_threshold"</span>: <span class="hljs-number">80</span>,
            <span class="hljs-string">"report_frequency"</span>: <span class="hljs-string">"daily"</span>
        }
    }

    <span class="hljs-comment"># Save pipeline config</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'pipeline_config.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        json.dump(pipeline_config, f, indent=<span class="hljs-number">2</span>)

    print(<span class="hljs-string">"✅ CI/CD pipeline configuration created"</span>)
    print(<span class="hljs-string">"Next steps:"</span>)
    print(<span class="hljs-string">"1. Review pipeline_config.json"</span>)
    print(<span class="hljs-string">"2. Set up CodePipeline in AWS Console"</span>)
    print(<span class="hljs-string">"3. Configure S3 triggers for automatic retraining"</span>)
    print(<span class="hljs-string">"4. Set up CloudWatch alarms for monitoring"</span>)

    <span class="hljs-keyword">return</span> pipeline_config

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_monitoring_dashboard</span>():</span>
    <span class="hljs-string">"""Create CloudWatch dashboard configuration"""</span>

    dashboard = {
        <span class="hljs-string">"widgets"</span>: [
            {
                <span class="hljs-string">"type"</span>: <span class="hljs-string">"metric"</span>,
                <span class="hljs-string">"properties"</span>: {
                    <span class="hljs-string">"metrics"</span>: [
                        [<span class="hljs-string">"AWS/SageMaker"</span>, <span class="hljs-string">"Invocations"</span>, <span class="hljs-string">"EndpointName"</span>, <span class="hljs-string">"your-endpoint"</span>],
                        [<span class="hljs-string">"AWS/SageMaker"</span>, <span class="hljs-string">"ModelLatency"</span>, <span class="hljs-string">"EndpointName"</span>, <span class="hljs-string">"your-endpoint"</span>]
                    ],
                    <span class="hljs-string">"view"</span>: <span class="hljs-string">"timeSeries"</span>,
                    <span class="hljs-string">"stacked"</span>: <span class="hljs-literal">False</span>,
                    <span class="hljs-string">"region"</span>: <span class="hljs-string">"us-east-1"</span>,
                    <span class="hljs-string">"title"</span>: <span class="hljs-string">"Endpoint Performance"</span>
                }
            },
            {
                <span class="hljs-string">"type"</span>: <span class="hljs-string">"metric"</span>,
                <span class="hljs-string">"properties"</span>: {
                    <span class="hljs-string">"metrics"</span>: [
                        [<span class="hljs-string">"AWS/SageMaker"</span>, <span class="hljs-string">"CPUUtilization"</span>, <span class="hljs-string">"EndpointName"</span>, <span class="hljs-string">"your-endpoint"</span>],
                        [<span class="hljs-string">"AWS/SageMaker"</span>, <span class="hljs-string">"MemoryUtilization"</span>, <span class="hljs-string">"EndpointName"</span>, <span class="hljs-string">"your-endpoint"</span>]
                    ],
                    <span class="hljs-string">"view"</span>: <span class="hljs-string">"gauge"</span>,
                    <span class="hljs-string">"region"</span>: <span class="hljs-string">"us-east-1"</span>,
                    <span class="hljs-string">"title"</span>: <span class="hljs-string">"Resource Utilization"</span>
                }
            },
            {
                <span class="hljs-string">"type"</span>: <span class="hljs-string">"text"</span>,
                <span class="hljs-string">"properties"</span>: {
                    <span class="hljs-string">"markdown"</span>: <span class="hljs-string">"# Fine-Tuned Model Dashboard\n\n## Key Metrics\n- **Cost Today**: $12.45\n- **Total Invocations**: 12,345\n- **Avg Latency**: 245ms\n- **Error Rate**: 0.12%\n\n## Actions\n- [View Detailed Logs](https://console.aws.amazon.com/cloudwatch/home)\n- [Open SageMaker Console](https://console.aws.amazon.com/sagemaker/home)"</span>
                }
            }
        ]
    }

    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'dashboard_config.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        json.dump(dashboard, f, indent=<span class="hljs-number">2</span>)

    print(<span class="hljs-string">"✅ Dashboard configuration created"</span>)

    <span class="hljs-keyword">return</span> dashboard

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_cost_estimator</span>():</span>
    <span class="hljs-string">"""Create cost estimation tool"""</span>

    estimator = {
        <span class="hljs-string">"instance_pricing"</span>: {
            <span class="hljs-string">"ml.g5.xlarge"</span>: {<span class="hljs-string">"on_demand"</span>: <span class="hljs-number">1.212</span>, <span class="hljs-string">"spot"</span>: <span class="hljs-number">0.3636</span>},
            <span class="hljs-string">"ml.g5.2xlarge"</span>: {<span class="hljs-string">"on_demand"</span>: <span class="hljs-number">2.176</span>, <span class="hljs-string">"spot"</span>: <span class="hljs-number">0.6528</span>},
            <span class="hljs-string">"ml.g5.4xlarge"</span>: {<span class="hljs-string">"on_demand"</span>: <span class="hljs-number">4.352</span>, <span class="hljs-string">"spot"</span>: <span class="hljs-number">1.3056</span>},
            <span class="hljs-string">"ml.g5.8xlarge"</span>: {<span class="hljs-string">"on_demand"</span>: <span class="hljs-number">8.704</span>, <span class="hljs-string">"spot"</span>: <span class="hljs-number">2.6112</span>},
            <span class="hljs-string">"ml.g5.12xlarge"</span>: {<span class="hljs-string">"on_demand"</span>: <span class="hljs-number">13.056</span>, <span class="hljs-string">"spot"</span>: <span class="hljs-number">3.9168</span>}
        },
        <span class="hljs-string">"training_estimator"</span>: {
            <span class="hljs-string">"small"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.2xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">4</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">8.70</span>},
            <span class="hljs-string">"medium"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.4xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">8</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">34.82</span>},
            <span class="hljs-string">"large"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.8xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">16</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">139.26</span>}
        },
        <span class="hljs-string">"inference_estimator"</span>: {
            <span class="hljs-string">"low_traffic"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">24</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">29.09</span>},
            <span class="hljs-string">"medium_traffic"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.2xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">24</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">52.22</span>},
            <span class="hljs-string">"high_traffic"</span>: {<span class="hljs-string">"instances"</span>: <span class="hljs-string">"ml.g5.4xlarge"</span>, <span class="hljs-string">"hours"</span>: <span class="hljs-number">24</span>, <span class="hljs-string">"cost"</span>: <span class="hljs-number">104.45</span>}
        }
    }

    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'cost_estimator.json'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        json.dump(estimator, f, indent=<span class="hljs-number">2</span>)

    print(<span class="hljs-string">"✅ Cost estimator created"</span>)

    <span class="hljs-comment"># Create simple Python calculator</span>
    calculator_code = <span class="hljs-string">'''
def estimate_training_cost(instance_type, hours, use_spot=True):
    """Estimate training cost"""
    pricing = {
        "ml.g5.xlarge": 1.212,
        "ml.g5.2xlarge": 2.176,
        "ml.g5.4xlarge": 4.352,
        "ml.g5.8xlarge": 8.704,
    }

    hourly = pricing.get(instance_type, 2.0)
    if use_spot:
        hourly *= 0.3  # 70% discount

    return hourly * hours

def estimate_monthly_inference(instance_type, requests_per_day, avg_latency_ms=200):
    """Estimate monthly inference cost"""
    pricing = {
        "ml.g5.xlarge": 1.212,
        "ml.g5.2xlarge": 2.176,
    }

    # Calculate instance hours needed
    total_processing_seconds = requests_per_day * (avg_latency_ms / 1000)
    instance_hours = total_processing_seconds / 3600

    # Add 20% buffer
    instance_hours *= 1.2

    hourly = pricing.get(instance_type, 1.5)
    daily_cost = hourly * instance_hours
    monthly_cost = daily_cost * 30

    return {
        "daily_cost": round(daily_cost, 2),
        "monthly_cost": round(monthly_cost, 2),
        "instance_hours_per_day": round(instance_hours, 2)
    }
'''</span>

    <span class="hljs-keyword">with</span> open(<span class="hljs-string">'cost_calculator.py'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
        f.write(calculator_code)

    <span class="hljs-keyword">return</span> estimator

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    print(<span class="hljs-string">"Setting up production configuration..."</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    <span class="hljs-comment"># Create all configurations</span>
    pipeline = create_ci_cd_pipeline()
    dashboard = create_monitoring_dashboard()
    cost_config = create_cost_estimator()

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"PRODUCTION SETUP COMPLETE"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"\nCreated files:"</span>)
    print(<span class="hljs-string">"1. pipeline_config.json - CI/CD pipeline configuration"</span>)
    print(<span class="hljs-string">"2. dashboard_config.json - CloudWatch dashboard"</span>)
    print(<span class="hljs-string">"3. cost_estimator.json - Cost estimation data"</span>)
    print(<span class="hljs-string">"4. cost_calculator.py - Python cost calculator"</span>)

    print(<span class="hljs-string">"\nNext steps for production:"</span>)
    print(<span class="hljs-string">"1. Set up AWS Budgets with alerts"</span>)
    print(<span class="hljs-string">"2. Configure VPC for private endpoint access"</span>)
    print(<span class="hljs-string">"3. Set up logging to S3 for compliance"</span>)
    print(<span class="hljs-string">"4. Implement A/B testing for model versions"</span>)
    print(<span class="hljs-string">"5. Create automated retraining pipeline"</span>)
</code></pre>
<h2 id="heading-troubleshooting-common-issues"><strong>Troubleshooting Common Issues</strong></h2>
<h3 id="heading-issue-1-no-space-left-on-device"><strong>Issue 1: "No space left on device"</strong></h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Add to training script:</span>
training_args = TrainingArguments(
    gradient_checkpointing=True,  <span class="hljs-comment"># Reduces memory</span>
    gradient_accumulation_steps=4,  <span class="hljs-comment"># Simulates larger batch</span>
    fp16=False,  <span class="hljs-comment"># Use bf16 instead</span>
    bf16=True,
)
</code></pre>
<h3 id="heading-issue-2-training-too-slow"><strong>Issue 2: Training too slow</strong></h3>
<p>python</p>
<pre><code class="lang-python"><span class="hljs-comment"># Switch to a faster instance</span>
<span class="hljs-comment"># ml.g5.2xlarge → ml.g5.4xlarge (2x faster, 2x cost)</span>
<span class="hljs-comment"># Use gradient accumulation instead of larger batch size</span>
</code></pre>
<h3 id="heading-issue-3-model-not-learning"><strong>Issue 3: Model not learning</strong></h3>
<p>python</p>
<pre><code class="lang-python"><span class="hljs-comment"># Check your data format</span>
<span class="hljs-comment"># Lower learning rate: 2e-4 → 1e-4</span>
<span class="hljs-comment"># Increase epochs: 3 → 5</span>
<span class="hljs-comment"># Add more diverse training examples</span>
</code></pre>
<h2 id="heading-quick-start-one-command-setup"><strong>Quick Start - One Command Setup</strong></h2>
<p>Create <a target="_blank" href="http://setup.sh"><code>setup.sh</code></a>:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-comment"># setup.sh - Complete setup script</span>

<span class="hljs-built_in">echo</span> <span class="hljs-string">"🚀 Starting Llama 3 Fine-Tuning Setup..."</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"=========================================="</span>

<span class="hljs-comment"># Step 1: Setup environment</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"1. Setting up Python environment..."</span>
python -m venv venv
<span class="hljs-built_in">source</span> venv/bin/activate
pip install -r requirements.txt

<span class="hljs-comment"># Step 2: Prepare data</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"2. Preparing sample data..."</span>
python data/prepare_data.py

<span class="hljs-comment"># Step 3: Setup AWS (interactive)</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"3. Setting up AWS..."</span>
<span class="hljs-built_in">read</span> -p <span class="hljs-string">"Enter your SageMaker Role ARN: "</span> ROLE_ARN
<span class="hljs-built_in">read</span> -p <span class="hljs-string">"Enter S3 bucket name: "</span> BUCKET_NAME

<span class="hljs-comment"># Step 4: Upload to S3</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"4. Uploading to S3..."</span>
aws s3 mb s3://<span class="hljs-variable">$BUCKET_NAME</span>
aws s3 cp data/train.json s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/train/
aws s3 cp data/validation.json s3://<span class="hljs-variable">$BUCKET_NAME</span>/data/validation/

<span class="hljs-comment"># Step 5: Launch training</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"5. Launching training job..."</span>
python launch_training.py

<span class="hljs-built_in">echo</span> <span class="hljs-string">"✅ Setup complete!"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-string">"Training job launched. Check AWS Console for progress."</span>
</code></pre>
<p>Make it executable and run:</p>
<p>bash</p>
<pre><code class="lang-bash">chmod +x setup.sh
./setup.sh
</code></pre>
<hr />
<h2 id="heading-summary-your-complete-path"><strong>Summary: Your Complete Path</strong></h2>
<ol>
<li><p><strong>Hour 0-1</strong>: Setup AWS, install dependencies, prepare data</p>
</li>
<li><p><strong>Hour 1-2</strong>: Configure SageMaker, upload data to S3</p>
</li>
<li><p><strong>Hour 2-3</strong>: Launch training job (runs for 2-4 hours)</p>
</li>
<li><p><strong>Hour 6-7</strong>: Check results, download model</p>
</li>
<li><p><strong>Hour 7-8</strong>: Deploy endpoint, test inference</p>
</li>
<li><p><strong>Hour 8+</strong>: Set up monitoring, CI/CD, production features</p>
</li>
</ol>
</li>
</ol>
</li>
</ul>
<p>        <strong>Total hands-on time</strong>: 2-3 hours<br />        <strong>Total wait time</strong>: 2-4 hours (training) + 10-15 minutes (deployment)<br />        <strong>Total cost</strong>: $10-50 depending on configuration</p>
<hr />
<p>        <strong>Need help?</strong> Common issues and solutions:</p>
<ol>
<li><p><strong>Permission errors</strong>: Make sure your IAM role has SageMakerFullAccess</p>
</li>
<li><p><strong>Out of memory</strong>: Reduce batch size, enable gradient checkpointing</p>
</li>
<li><p><strong>Training too slow</strong>: Use larger instance or spot instances</p>
</li>
<li><p><strong>Model not loading</strong>: Check Hugging Face token for Llama 3 access</p>
</li>
</ol>
<p>        This is the <strong>complete, end-to-end guide</strong> with every single step. Copy and run each command in order, and you'll have a fine-tuned model running in production.</p>
]]></content:encoded></item><item><title><![CDATA[Beyond ChatGPT: Building Your Own Enterprise RAG Chatbot with Amazon Bedrock & Knowledge Bases]]></title><description><![CDATA[Introduction: The Limitations of Generic LLMs
While ChatGPT has revolutionized how we interact with AI, enterprises face critical challenges when using generic large language models:

Outdated Knowledge: Models are trained on data up to a specific cu...]]></description><link>https://blog.omprakashthakur.com.np/beyond-chatgpt-building-your-own-enterprise-rag-chatbot-with-amazon-bedrock-and-knowledge-bases</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/beyond-chatgpt-building-your-own-enterprise-rag-chatbot-with-amazon-bedrock-and-knowledge-bases</guid><category><![CDATA[AWS, Generative AI, Amazon Bedrock, RAG, Chatbot, Enterprise AI, Vector Database]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Thu, 01 Jan 2026 09:42:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767258194749/6421cd16-cb26-4324-8c2d-b6a512784a84.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction-the-limitations-of-generic-llms"><strong>Introduction: The Limitations of Generic LLMs</strong></h2>
<p>While ChatGPT has revolutionized how we interact with AI, enterprises face critical challenges when using generic large language models:</p>
<ol>
<li><p><strong>Outdated Knowledge:</strong> Models are trained on data up to a specific cutoff date</p>
</li>
<li><p><strong>No Access to Proprietary Data:</strong> Cannot answer questions about your internal documents, policies, or databases</p>
</li>
<li><p><strong>Hallucination Risk:</strong> Models may invent plausible-sounding but incorrect information</p>
</li>
<li><p><strong>Security Concerns:</strong> Sensitive data exposure when using public APIs</p>
</li>
</ol>
<p>The solution? <strong>Retrieval-Augmented Generation (RAG)</strong> - a technique that combines the power of LLMs with your proprietary data. In this comprehensive guide, we'll build a production-ready enterprise chatbot using AWS's managed services.</p>
<h2 id="heading-architecture-overview"><strong>Architecture Overview</strong></h2>
<p>Here's what we're building:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767258638727/edd103ed-0bd8-43b4-a3fc-4cc1b8cfac84.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<p>Before we begin, ensure you have:</p>
<ol>
<li><p><strong>AWS Account</strong> with appropriate permissions</p>
</li>
<li><p><strong>Amazon Bedrock Access</strong> requested (go to Bedrock console → Model access)</p>
</li>
<li><p><strong>Python 3.9+</strong> and <strong>AWS CLI</strong> configured</p>
</li>
<li><p><strong>Sample documents</strong> for testing (PDFs, Word docs, text files)</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767258750331/6299fdd0-911d-4deb-b3c4-ea63269525ba.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-step-1-setting-up-the-knowledge-base"><strong>Step 1: Setting Up the Knowledge Base</strong></h2>
<h3 id="heading-11-create-an-s3-bucket-for-your-documents"><strong>1.1 Create an S3 Bucket for Your Documents</strong></h3>
<pre><code class="lang-bash"> <span class="hljs-comment"># Create a unique bucket name</span>
 BUCKET_NAME=<span class="hljs-string">"enterprise-rag-documents-<span class="hljs-subst">$(date +%s)</span>"</span>
 aws s3 mb s3://<span class="hljs-variable">$BUCKET_NAME</span>

 <span class="hljs-comment"># Upload sample documents</span>
 aws s3 cp ./documents/ s3://<span class="hljs-variable">$BUCKET_NAME</span>/ --recursive
</code></pre>
<h3 id="heading-12-configure-amazon-bedrock-knowledge-base"><strong>1.2 Configure Amazon Bedrock Knowledge Base</strong></h3>
<p> Navigate to <strong>Amazon Bedrock → Knowledge Bases → Create Knowledge Base</strong></p>
<p> <strong>Configuration Parameters:</strong></p>
<ul>
<li><p><strong>Knowledge base name:</strong> <code>enterprise-knowledge-base</code></p>
</li>
<li><p><strong>IAM role:</strong> Create new role with S3 and Bedrock permissions</p>
</li>
<li><p><strong>Data source:</strong> Your S3 bucket</p>
</li>
<li><p><strong>Embeddings model:</strong> <code>amazon.titan-embed-text-v2</code> (default)</p>
</li>
<li><p><strong>Vector database:</strong> Choose <code>Quick create a new vector store</code></p>
</li>
<li><p><strong>Advanced settings:</strong> Enable hybrid search for better results</p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-bash">    {
      <span class="hljs-string">"knowledgeBaseConfiguration"</span>: {
        <span class="hljs-string">"type"</span>: <span class="hljs-string">"VECTOR"</span>,
        <span class="hljs-string">"vectorKnowledgeBaseConfiguration"</span>: {
          <span class="hljs-string">"embeddingModelArn"</span>: <span class="hljs-string">"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2"</span>
        }
      },
      <span class="hljs-string">"storageConfiguration"</span>: {
        <span class="hljs-string">"type"</span>: <span class="hljs-string">"OPENSEARCH_SERVERLESS"</span>,
        <span class="hljs-string">"opensearchServerlessConfiguration"</span>: {
          <span class="hljs-string">"collectionArn"</span>: <span class="hljs-string">"arn:aws:aoss:us-east-1:123456789012:collection/your-collection"</span>,
          <span class="hljs-string">"vectorIndexName"</span>: <span class="hljs-string">"enterprise-docs-index"</span>,
          <span class="hljs-string">"fieldMapping"</span>: {
            <span class="hljs-string">"vectorField"</span>: <span class="hljs-string">"embedding"</span>,
            <span class="hljs-string">"textField"</span>: <span class="hljs-string">"content"</span>,
            <span class="hljs-string">"metadataField"</span>: <span class="hljs-string">"metadata"</span>
          }
        }
      },
      <span class="hljs-string">"dataSourceConfiguration"</span>: {
        <span class="hljs-string">"type"</span>: <span class="hljs-string">"S3"</span>,
        <span class="hljs-string">"s3Configuration"</span>: {
          <span class="hljs-string">"bucketArn"</span>: <span class="hljs-string">"arn:aws:s3:::your-documents-bucket"</span>,
          <span class="hljs-string">"inclusionPrefixes"</span>: [<span class="hljs-string">"documents/"</span>]
        }
      }
    }
</code></pre>
<h2 id="heading-step-2-building-the-backend-orchestrator"><strong>Step 2: Building the Backend Orchestrator</strong></h2>
<h3 id="heading-21-create-lambda-function-with-dependencies"><strong>2.1 Create Lambda Function with Dependencies</strong></h3>
<p>    Create a <code>requirements.txt</code>:</p>
<pre><code class="lang-bash">    boto3&gt;=1.28.0
    aws-lambda-powertools&gt;=2.0.0
    python-dotenv&gt;=1.0.0
</code></pre>
<p>    Create the Lambda function:</p>
<pre><code class="lang-bash">    <span class="hljs-comment"># lambda_handler.py</span>
    import json
    import boto3
    import os
    from typing import Dict, Any
    from botocore.exceptions import ClientError

    <span class="hljs-comment"># Initialize AWS clients</span>
    bedrock_agent_runtime = boto3.client(<span class="hljs-string">'bedrock-agent-runtime'</span>)
    bedrock = boto3.client(<span class="hljs-string">'bedrock-runtime'</span>)

    class RAGOrchestrator:
        def __init__(self, knowledge_base_id: str, model_id: str = <span class="hljs-string">"anthropic.claude-3-sonnet-20240229"</span>):
            self.knowledge_base_id = knowledge_base_id
            self.model_id = model_id
            self.region = os.environ.get(<span class="hljs-string">'AWS_REGION'</span>, <span class="hljs-string">'us-east-1'</span>)

        def retrieve_context(self, query: str, max_results: int = 5) -&gt; Dict[str, Any]:
            <span class="hljs-string">""</span><span class="hljs-string">"Retrieve relevant context from knowledge base"</span><span class="hljs-string">""</span>
            try:
                response = bedrock_agent_runtime.retrieve(
                    knowledgeBaseId=self.knowledge_base_id,
                    retrievalQuery={
                        <span class="hljs-string">'text'</span>: query
                    },
                    retrievalConfiguration={
                        <span class="hljs-string">'vectorSearchConfiguration'</span>: {
                            <span class="hljs-string">'numberOfResults'</span>: max_results,
                            <span class="hljs-string">'overrideSearchType'</span>: <span class="hljs-string">'HYBRID'</span>
                        }
                    }
                )

                <span class="hljs-comment"># Extract and format retrieved passages</span>
                contexts = []
                <span class="hljs-keyword">for</span> result <span class="hljs-keyword">in</span> response.get(<span class="hljs-string">'retrievalResults'</span>, []):
                    contexts.append({
                        <span class="hljs-string">'content'</span>: result[<span class="hljs-string">'content'</span>][<span class="hljs-string">'text'</span>],
                        <span class="hljs-string">'metadata'</span>: result.get(<span class="hljs-string">'metadata'</span>, {}),
                        <span class="hljs-string">'score'</span>: result.get(<span class="hljs-string">'score'</span>, 0.0)
                    })

                <span class="hljs-built_in">return</span> {
                    <span class="hljs-string">'contexts'</span>: contexts,
                    <span class="hljs-string">'total_results'</span>: len(contexts)
                }

            except ClientError as e:
                <span class="hljs-built_in">print</span>(f<span class="hljs-string">"Error retrieving context: {e}"</span>)
                <span class="hljs-built_in">return</span> {<span class="hljs-string">'contexts'</span>: [], <span class="hljs-string">'total_results'</span>: 0}

        def generate_response(self, query: str, context: str) -&gt; str:
            <span class="hljs-string">""</span><span class="hljs-string">"Generate response using LLM with retrieved context"</span><span class="hljs-string">""</span>

            <span class="hljs-comment"># Prepare the prompt with context</span>
            prompt = f<span class="hljs-string">""</span><span class="hljs-string">"Human: You are an expert assistant for our enterprise. Use the following context to answer the question.

            Context:
            {context}

            Question: {query}

            Instructions:
            1. Answer based ONLY on the provided context
            2. If the context doesn't contain relevant information, say "</span>I don<span class="hljs-string">'t have enough information to answer this question based on the available documents."
            3. Cite specific sources when possible
            4. Keep the response concise and professional

            Assistant:"""

            try:
                # For Claude models
                response = bedrock.invoke_model(
                    modelId=self.model_id,
                    body=json.dumps({
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 1000,
                        "messages": [
                            {
                                "role": "user",
                                "content": prompt
                            }
                        ]
                    }),
                    contentType='</span>application/json<span class="hljs-string">'
                )

                response_body = json.loads(response['</span>body<span class="hljs-string">'].read())
                return response_body['</span>content<span class="hljs-string">'][0]['</span>text<span class="hljs-string">']

            except ClientError as e:
                print(f"Error generating response: {e}")
                return "I apologize, but I'</span>m having trouble generating a response at the moment.<span class="hljs-string">"

    def lambda_handler(event, context):
        "</span><span class="hljs-string">""</span>Main Lambda handler<span class="hljs-string">""</span><span class="hljs-string">"

        # Extract query from event
        query = event.get('query', '').strip()
        if not query:
            return {
                'statusCode': 400,
                'body': json.dumps({'error': 'Query is required'})
            }

        # Initialize orchestrator
        knowledge_base_id = os.environ['KNOWLEDGE_BASE_ID']
        orchestrator = RAGOrchestrator(knowledge_base_id)

        # Step 1: Retrieve relevant context
        retrieval_result = orchestrator.retrieve_context(query)

        if retrieval_result['total_results'] == 0:
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'response': "</span>I couldn<span class="hljs-string">'t find relevant information in our knowledge base to answer your question.",
                    '</span>sources<span class="hljs-string">': []
                })
            }

        # Combine retrieved contexts
        combined_context = "\n\n".join([
            f"Source {i+1}:\n{ctx['</span>content<span class="hljs-string">']}\n[Metadata: {ctx['</span>metadata<span class="hljs-string">']}]"
            for i, ctx in enumerate(retrieval_result['</span>contexts<span class="hljs-string">'])
        ])

        # Step 2: Generate response using LLM
        response = orchestrator.generate_response(query, combined_context)

        # Prepare sources for citation
        sources = [
            {
                '</span>content<span class="hljs-string">': ctx['</span>content<span class="hljs-string">'][:200] + '</span>...<span class="hljs-string">',  # Preview
                '</span>metadata<span class="hljs-string">': ctx['</span>metadata<span class="hljs-string">'],
                '</span>relevance_score<span class="hljs-string">': ctx['</span>score<span class="hljs-string">']
            }
            for ctx in retrieval_result['</span>contexts<span class="hljs-string">']
        ]

        return {
            '</span>statusCode<span class="hljs-string">': 200,
            '</span>body<span class="hljs-string">': json.dumps({
                '</span>response<span class="hljs-string">': response,
                '</span>sources<span class="hljs-string">': sources,
                '</span>retrieved_context_count<span class="hljs-string">': retrieval_result['</span>total_results<span class="hljs-string">']
            })
        }</span>
</code></pre>
<h3 id="heading-22-deploy-with-aws-sam-optional"><strong>2.2 Deploy with AWS SAM (Optional)</strong></h3>
<p>    Create a <code>template.yaml</code> for easy deployment:</p>
<pre><code class="lang-bash">    AWSTemplateFormatVersion: <span class="hljs-string">'2010-09-09'</span>
    Transform: AWS::Serverless-2016-10-31
    Description: Enterprise RAG Chatbot

    Resources:
      RagChatbotFunction:
        Type: AWS::Serverless::Function
        Properties:
          CodeUri: lambda/
          Handler: lambda_handler.lambda_handler
          Runtime: python3.9
          Timeout: 30
          MemorySize: 512
          Environment:
            Variables:
              KNOWLEDGE_BASE_ID: !Ref KnowledgeBaseId
          Policies:
            - BedrockKnowledgeBasePolicy:
                KnowledgeBaseId: !Ref KnowledgeBaseId
            - S3ReadPolicy:
                BucketName: !Ref DocumentBucket
          Events:
            ApiEvent:
              Type: Api
              Properties:
                Path: /query
                Method: post

      DocumentBucket:
        Type: AWS::S3::Bucket
        Properties:
          BucketName: !Sub enterprise-docs-<span class="hljs-variable">${AWS::AccountId}</span>

    Outputs:
      ApiEndpoint:
        Description: <span class="hljs-string">"API Gateway endpoint URL"</span>
        Value: !Sub <span class="hljs-string">"https://<span class="hljs-variable">${ServerlessRestApi}</span>.execute-api.<span class="hljs-variable">${AWS::Region}</span>.amazonaws.com/Prod/query"</span>
</code></pre>
<h2 id="heading-step-3-creating-a-simple-web-interface"><strong>Step 3: Creating a Simple Web Interface</strong></h2>
<p>    Create a basic React frontend (<code>index.html</code>):</p>
<pre><code class="lang-bash">    &lt;!DOCTYPE html&gt;
    &lt;html lang=<span class="hljs-string">"en"</span>&gt;
    &lt;head&gt;
        &lt;meta charset=<span class="hljs-string">"UTF-8"</span>&gt;
        &lt;meta name=<span class="hljs-string">"viewport"</span> content=<span class="hljs-string">"width=device-width, initial-scale=1.0"</span>&gt;
        &lt;title&gt;Enterprise RAG Chatbot&lt;/title&gt;
        &lt;script src=<span class="hljs-string">"https://cdn.tailwindcss.com"</span>&gt;&lt;/script&gt;
        &lt;link rel=<span class="hljs-string">"stylesheet"</span> href=<span class="hljs-string">"https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css"</span>&gt;
    &lt;/head&gt;
    &lt;body class=<span class="hljs-string">"bg-gray-50 min-h-screen"</span>&gt;
        &lt;div class=<span class="hljs-string">"container mx-auto px-4 py-8 max-w-4xl"</span>&gt;
            &lt;header class=<span class="hljs-string">"mb-8"</span>&gt;
                &lt;h1 class=<span class="hljs-string">"text-3xl font-bold text-gray-800 mb-2"</span>&gt;
                    &lt;i class=<span class="hljs-string">"fas fa-robot mr-3 text-blue-500"</span>&gt;&lt;/i&gt;
                    Enterprise Knowledge Assistant
                &lt;/h1&gt;
                &lt;p class=<span class="hljs-string">"text-gray-600"</span>&gt;Ask questions about your company documents, policies, and procedures.&lt;/p&gt;
            &lt;/header&gt;

            &lt;div class=<span class="hljs-string">"bg-white rounded-lg shadow-lg p-6 mb-6"</span>&gt;
                &lt;div id=<span class="hljs-string">"chat-container"</span> class=<span class="hljs-string">"h-96 overflow-y-auto mb-4 p-4 border rounded-lg bg-gray-50"</span>&gt;
                    &lt;div class=<span class="hljs-string">"text-center text-gray-500 py-8"</span>&gt;
                        &lt;i class=<span class="hljs-string">"fas fa-comments text-3xl mb-3"</span>&gt;&lt;/i&gt;
                        &lt;p&gt;Start a conversation by typing your question below.&lt;/p&gt;
                    &lt;/div&gt;
                &lt;/div&gt;

                &lt;div class=<span class="hljs-string">"flex space-x-4"</span>&gt;
                    &lt;input 
                        <span class="hljs-built_in">type</span>=<span class="hljs-string">"text"</span> 
                        id=<span class="hljs-string">"query-input"</span> 
                        placeholder=<span class="hljs-string">"Ask about company policies, procedures, or documents..."</span> 
                        class=<span class="hljs-string">"flex-grow p-3 border rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-blue-500 outline-none"</span>
                    &gt;
                    &lt;button 
                        id=<span class="hljs-string">"send-btn"</span> 
                        class=<span class="hljs-string">"bg-blue-500 text-white px-6 py-3 rounded-lg hover:bg-blue-600 transition font-semibold"</span>
                    &gt;
                        &lt;i class=<span class="hljs-string">"fas fa-paper-plane mr-2"</span>&gt;&lt;/i&gt;Ask
                    &lt;/button&gt;
                &lt;/div&gt;

                &lt;div class=<span class="hljs-string">"mt-4 text-sm text-gray-500"</span>&gt;
                    &lt;p&gt;&lt;i class=<span class="hljs-string">"fas fa-info-circle mr-1"</span>&gt;&lt;/i&gt; This chatbot searches through all company documents to find accurate answers.&lt;/p&gt;
                &lt;/div&gt;
            &lt;/div&gt;

            &lt;div id=<span class="hljs-string">"sources-panel"</span> class=<span class="hljs-string">"bg-white rounded-lg shadow-lg p-6 hidden"</span>&gt;
                &lt;h3 class=<span class="hljs-string">"text-lg font-semibold mb-4 text-gray-700"</span>&gt;
                    &lt;i class=<span class="hljs-string">"fas fa-file-alt mr-2"</span>&gt;&lt;/i&gt;Sources Used
                &lt;/h3&gt;
                &lt;div id=<span class="hljs-string">"sources-list"</span>&gt;&lt;/div&gt;
            &lt;/div&gt;
        &lt;/div&gt;

        &lt;script&gt;
            const API_ENDPOINT = <span class="hljs-string">'YOUR_API_GATEWAY_ENDPOINT'</span>; // Replace with your endpoint

            document.getElementById(<span class="hljs-string">'send-btn'</span>).addEventListener(<span class="hljs-string">'click'</span>, sendQuery);
            document.getElementById(<span class="hljs-string">'query-input'</span>).addEventListener(<span class="hljs-string">'keypress'</span>, (e) =&gt; {
                <span class="hljs-keyword">if</span> (e.key === <span class="hljs-string">'Enter'</span>) sendQuery();
            });

            async <span class="hljs-keyword">function</span> <span class="hljs-function"><span class="hljs-title">sendQuery</span></span>() {
                const queryInput = document.getElementById(<span class="hljs-string">'query-input'</span>);
                const query = queryInput.value.trim();

                <span class="hljs-keyword">if</span> (!query) <span class="hljs-built_in">return</span>;

                // Add user message to chat
                addMessage(query, <span class="hljs-string">'user'</span>);
                queryInput.value = <span class="hljs-string">''</span>;

                // Show typing indicator
                const typingId = showTypingIndicator();

                try {
                    const response = await fetch(API_ENDPOINT, {
                        method: <span class="hljs-string">'POST'</span>,
                        headers: {
                            <span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'application/json'</span>,
                        },
                        body: JSON.stringify({ query: query })
                    });

                    const data = await response.json();

                    // Remove typing indicator
                    removeTypingIndicator(typingId);

                    // Add AI response
                    addMessage(data.response, <span class="hljs-string">'ai'</span>);

                    // Show sources <span class="hljs-keyword">if</span> available
                    <span class="hljs-keyword">if</span> (data.sources &amp;&amp; data.sources.length &gt; 0) {
                        showSources(data.sources);
                    }

                } catch (error) {
                    console.error(<span class="hljs-string">'Error:'</span>, error);
                    removeTypingIndicator(typingId);
                    addMessage(<span class="hljs-string">'Sorry, there was an error processing your request.'</span>, <span class="hljs-string">'ai'</span>);
                }
            }

            <span class="hljs-keyword">function</span> addMessage(content, sender) {
                const chatContainer = document.getElementById(<span class="hljs-string">'chat-container'</span>);

                const messageDiv = document.createElement(<span class="hljs-string">'div'</span>);
                messageDiv.className = `mb-4 <span class="hljs-variable">${sender === 'user' ? 'text-right' : ''}</span>`;

                const bubble = document.createElement(<span class="hljs-string">'div'</span>);
                bubble.className = `inline-block p-4 rounded-lg max-w-xs md:max-w-md <span class="hljs-variable">${
                    sender === 'user' 
                        ? 'bg-blue-500 text-white rounded-br-none' 
                        : 'bg-gray-200 text-gray-800 rounded-bl-none'
                }</span>`;

                bubble.innerHTML = `&lt;p class=<span class="hljs-string">"whitespace-pre-wrap"</span>&gt;<span class="hljs-variable">${content}</span>&lt;/p&gt;`;

                messageDiv.appendChild(bubble);
                chatContainer.appendChild(messageDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
            }

            <span class="hljs-keyword">function</span> <span class="hljs-function"><span class="hljs-title">showTypingIndicator</span></span>() {
                const chatContainer = document.getElementById(<span class="hljs-string">'chat-container'</span>);
                const typingDiv = document.createElement(<span class="hljs-string">'div'</span>);
                typingDiv.id = <span class="hljs-string">'typing-indicator'</span>;
                typingDiv.className = <span class="hljs-string">'mb-4'</span>;
                typingDiv.innerHTML = `
                    &lt;div class=<span class="hljs-string">"inline-block p-4 rounded-lg bg-gray-200 rounded-bl-none"</span>&gt;
                        &lt;div class=<span class="hljs-string">"flex space-x-1"</span>&gt;
                            &lt;div class=<span class="hljs-string">"w-2 h-2 bg-gray-500 rounded-full animate-bounce"</span>&gt;&lt;/div&gt;
                            &lt;div class=<span class="hljs-string">"w-2 h-2 bg-gray-500 rounded-full animate-bounce"</span> style=<span class="hljs-string">"animation-delay: 0.2s"</span>&gt;&lt;/div&gt;
                            &lt;div class=<span class="hljs-string">"w-2 h-2 bg-gray-500 rounded-full animate-bounce"</span> style=<span class="hljs-string">"animation-delay: 0.4s"</span>&gt;&lt;/div&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                `;
                chatContainer.appendChild(typingDiv);
                chatContainer.scrollTop = chatContainer.scrollHeight;
                <span class="hljs-built_in">return</span> <span class="hljs-string">'typing-indicator'</span>;
            }

            <span class="hljs-keyword">function</span> removeTypingIndicator(id) {
                const indicator = document.getElementById(id);
                <span class="hljs-keyword">if</span> (indicator) indicator.remove();
            }

            <span class="hljs-keyword">function</span> showSources(sources) {
                const sourcesPanel = document.getElementById(<span class="hljs-string">'sources-panel'</span>);
                const sourcesList = document.getElementById(<span class="hljs-string">'sources-list'</span>);

                sourcesPanel.classList.remove(<span class="hljs-string">'hidden'</span>);
                sourcesList.innerHTML = <span class="hljs-string">''</span>;

                sources.forEach((<span class="hljs-built_in">source</span>, index) =&gt; {
                    const sourceDiv = document.createElement(<span class="hljs-string">'div'</span>);
                    sourceDiv.className = <span class="hljs-string">'mb-3 p-3 border rounded-lg hover:bg-gray-50'</span>;
                    sourceDiv.innerHTML = `
                        &lt;div class=<span class="hljs-string">"flex justify-between items-start"</span>&gt;
                            &lt;h4 class=<span class="hljs-string">"font-medium text-gray-800"</span>&gt;Source <span class="hljs-variable">${index + 1}</span>&lt;/h4&gt;
                            &lt;span class=<span class="hljs-string">"text-xs bg-blue-100 text-blue-800 px-2 py-1 rounded"</span>&gt;Score: <span class="hljs-variable">${source.relevance_score.toFixed(3)}</span>&lt;/span&gt;
                        &lt;/div&gt;
                        &lt;p class=<span class="hljs-string">"text-sm text-gray-600 mt-2"</span>&gt;<span class="hljs-variable">${source.content}</span>&lt;/p&gt;
                        &lt;div class=<span class="hljs-string">"text-xs text-gray-500 mt-2"</span>&gt;
                            &lt;i class=<span class="hljs-string">"fas fa-tag mr-1"</span>&gt;&lt;/i&gt;<span class="hljs-variable">${JSON.stringify(source.metadata)}</span>
                        &lt;/div&gt;
                    `;
                    sourcesList.appendChild(sourceDiv);
                });
            }
        &lt;/script&gt;
    &lt;/body&gt;
    &lt;/html&gt;
</code></pre>
<h2 id="heading-step-4-advanced-features-amp-optimization"><strong>Step 4: Advanced Features &amp; Optimization</strong></h2>
<h3 id="heading-41-implementing-conversation-memory"><strong>4.1 Implementing Conversation Memory</strong></h3>
<p>    Add a DynamoDB table for conversation history:</p>
<pre><code class="lang-bash">    <span class="hljs-comment"># Add to your Lambda function</span>
    import boto3
    from datetime import datetime

    dynamodb = boto3.resource(<span class="hljs-string">'dynamodb'</span>)
    conversation_table = dynamodb.Table(<span class="hljs-string">'RAGConversations'</span>)

    class ConversationManager:
        def __init__(self, session_id):
            self.session_id = session_id

        def save_interaction(self, query: str, response: str, sources: list):
            timestamp = datetime.utcnow().isoformat()

            conversation_table.put_item(
                Item={
                    <span class="hljs-string">'session_id'</span>: self.session_id,
                    <span class="hljs-string">'timestamp'</span>: timestamp,
                    <span class="hljs-string">'query'</span>: query,
                    <span class="hljs-string">'response'</span>: response,
                    <span class="hljs-string">'sources'</span>: sources,
                    <span class="hljs-string">'ttl'</span>: int(datetime.utcnow().timestamp()) + 86400  <span class="hljs-comment"># 24-hour TTL</span>
                }
            )

        def get_conversation_history(self, <span class="hljs-built_in">limit</span>: int = 5):
            response = conversation_table.query(
                KeyConditionExpression=<span class="hljs-string">'session_id = :sid'</span>,
                ScanIndexForward=False,
                Limit=<span class="hljs-built_in">limit</span>
            )
            <span class="hljs-built_in">return</span> response.get(<span class="hljs-string">'Items'</span>, [])
</code></pre>
<h3 id="heading-42-adding-document-level-access-control"><strong>4.2 Adding Document-Level Access Control</strong></h3>
<p>    Implement metadata filtering based on user roles:</p>
<pre><code class="lang-bash">    def retrieve_with_access_control(query: str, user_roles: list):
        <span class="hljs-comment"># Add metadata filter based on user roles</span>
        filter_conditions = {
            <span class="hljs-string">'andAll'</span>: [
                {
                    <span class="hljs-string">'equals'</span>: {
                        <span class="hljs-string">'key'</span>: <span class="hljs-string">'allowed_roles'</span>,
                        <span class="hljs-string">'value'</span>: user_role
                    }
                }
                <span class="hljs-keyword">for</span> user_role <span class="hljs-keyword">in</span> user_roles
            ]
        }

        response = bedrock_agent_runtime.retrieve(
            knowledgeBaseId=knowledge_base_id,
            retrievalQuery={<span class="hljs-string">'text'</span>: query},
            retrievalConfiguration={
                <span class="hljs-string">'vectorSearchConfiguration'</span>: {
                    <span class="hljs-string">'filter'</span>: filter_conditions,
                    <span class="hljs-string">'numberOfResults'</span>: 5
                }
            }
        )
        <span class="hljs-built_in">return</span> response
</code></pre>
<h2 id="heading-step-5-testing-amp-validation"><strong>Step 5: Testing &amp; Validation</strong></h2>
<h3 id="heading-test-cases-to-validate-your-rag-system"><strong>Test Cases to Validate Your RAG System:</strong></h3>
<pre><code class="lang-bash">    test_cases = [
        {
            <span class="hljs-string">"query"</span>: <span class="hljs-string">"What is our vacation policy for senior employees?"</span>,
            <span class="hljs-string">"expected_characteristics"</span>: [<span class="hljs-string">"should cite HR documents"</span>, <span class="hljs-string">"mention specific vacation days"</span>]
        },
        {
            <span class="hljs-string">"query"</span>: <span class="hljs-string">"How do I submit an expense report?"</span>,
            <span class="hljs-string">"expected_characteristics"</span>: [<span class="hljs-string">"mention the expense portal"</span>, <span class="hljs-string">"provide step-by-step instructions"</span>]
        },
        {
            <span class="hljs-string">"query"</span>: <span class="hljs-string">"What was our Q3 revenue?"</span>,
            <span class="hljs-string">"expected_characteristics"</span>: [<span class="hljs-string">"cite financial reports"</span>, <span class="hljs-string">"provide specific numbers"</span>]
        }
    ]

    <span class="hljs-comment"># Evaluation metrics to track:</span>
    <span class="hljs-comment"># 1. Response Relevance (0-5 scale)</span>
    <span class="hljs-comment"># 2. Citation Accuracy (are sources actually relevant?)</span>
    <span class="hljs-comment"># 3. Hallucination Rate (percentage of made-up information)</span>
    <span class="hljs-comment"># 4. Response Time (should be under 5 seconds)</span>
</code></pre>
<h2 id="heading-cost-estimation-amp-optimization"><strong>Cost Estimation &amp; Optimization</strong></h2>
<p>    <strong>Monthly Cost Breakdown (Estimated):</strong></p>
<ul>
<li><p><strong>Amazon Bedrock (Claude 3 Sonnet):</strong> ~$3 per 1M input tokens</p>
</li>
<li><p><strong>OpenSearch Serverless:</strong> ~$0.30 per OCU-hour (1 OCU = ~$720/month)</p>
</li>
<li><p><strong>Lambda:</strong> ~$0.20 per million requests (128MB, 3s average)</p>
</li>
<li><p><strong>S3:</strong> ~$0.023 per GB storage</p>
</li>
</ul>
<p>    <strong>Cost Optimization Tips:</strong></p>
<ol>
<li><p><strong>Use caching:</strong> Cache frequent queries in DynamoDB</p>
</li>
<li><p><strong>Implement query optimization:</strong> Use query rewriting to improve retrieval</p>
</li>
<li><p><strong>Monitor usage:</strong> Set up CloudWatch alarms for cost thresholds</p>
</li>
<li><p><strong>Consider smaller models:</strong> Use Claude Haiku for simpler queries</p>
</li>
</ol>
<h2 id="heading-best-practices-for-production"><strong>Best Practices for Production</strong></h2>
<ol>
<li><p><strong>Data Pipeline Management:</strong></p>
<ul>
<li><p>Automate document ingestion with S3 Event Notifications</p>
</li>
<li><p>Implement data quality checks before indexing</p>
</li>
<li><p>Schedule regular knowledge base synchronization</p>
</li>
</ul>
</li>
<li><p><strong>Security:</strong></p>
<ul>
<li><p>Encrypt data at rest (S3 SSE-S3/SSE-KMS)</p>
</li>
<li><p>Implement API authentication (Cognito, API Keys)</p>
</li>
<li><p>Use VPC endpoints for private access</p>
</li>
<li><p>Enable Bedrock guardrails for content filtering</p>
</li>
</ul>
</li>
<li><p><strong>Monitoring:</strong></p>
<ul>
<li><p>Track retrieval hit/miss rates</p>
</li>
<li><p>Monitor response latency (95th percentile &lt; 2s)</p>
</li>
<li><p>Set up user feedback collection (thumbs up/down)</p>
</li>
<li><p>Log all queries for compliance</p>
</li>
</ul>
</li>
<li><p><strong>Performance Tuning:</strong></p>
<ul>
<li><p>Experiment with different embedding models</p>
</li>
<li><p>Adjust chunking strategy (size, overlap)</p>
</li>
<li><p>Implement query expansion techniques</p>
</li>
<li><p>Use metadata filtering for better precision</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-common-pitfalls-amp-solutions"><strong>Common Pitfalls &amp; Solutions</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Pitfall</strong></td><td><strong>Solution</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Poor retrieval quality</td><td>Implement hybrid search, adjust chunk sizes, add metadata filtering</td></tr>
<tr>
<td>Hallucinations</td><td>Add strict prompt instructions, implement confidence scoring</td></tr>
<tr>
<td>Slow response times</td><td>Add caching, optimize Lambda memory, use async processing</td></tr>
<tr>
<td>Irrelevant sources</td><td>Fine-tune embedding model, improve document preprocessing</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>    Building an enterprise RAG chatbot with Amazon Bedrock provides a powerful, scalable solution for making proprietary data accessible through natural language. The managed services approach significantly reduces operational overhead while providing enterprise-grade security and reliability.</p>
<p>    <strong>Key Advantages of This Architecture:</strong></p>
<ul>
<li><p>✅ <strong>No infrastructure management</strong> - Fully managed by AWS</p>
</li>
<li><p>✅ <strong>Enterprise security</strong> - Private, compliant, and secure</p>
</li>
<li><p>✅ <strong>Scalable</strong> - Handles from 10 to 10,000 queries per second</p>
</li>
<li><p>✅ <strong>Cost-effective</strong> - Pay-per-use pricing model</p>
</li>
<li><p>✅ <strong>Accurate</strong> - Grounded in your actual documents</p>
</li>
</ul>
<p>    <strong>Next Steps for Your Implementation:</strong></p>
<ol>
<li><p>Start with a pilot department (e.g., HR or IT documentation)</p>
</li>
<li><p>Collect user feedback and iterate on prompt engineering</p>
</li>
<li><p>Implement advanced features like multi-modal support (images, tables)</p>
</li>
<li><p>Consider fine-tuning embeddings on your domain-specific data</p>
</li>
<li><p>Explore integration with existing systems (SharePoint, Confluence, Salesforce)</p>
</li>
</ol>
<hr />
<p>    <strong>Resources:</strong></p>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/bedrock/">Amazon Bedrock Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-best-practices.html">RAG Best Practices Guide</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/aws-samples/amazon-bedrock-samples">Sample Code Repository</a></p>
</li>
</ul>
<p>    <em>Need help implementing this? Have questions about specific use cases? Leave a comment below or reach out on</em> <a target="_blank" href="https://www.linkedin.com/in/omthakurofficial/">Me here</a> <em>.</em></p>
<hr />
<p>    <strong>Ready to deploy?</strong> Use the <strong>AWS CloudFormation template</strong> below for a one-click deployment:</p>
<pre><code class="lang-bash">    <span class="hljs-comment"># Save as rag-chatbot-cfn.yaml</span>
    <span class="hljs-comment"># Deploy with: aws cloudformation create-stack --stack-name enterprise-rag-chatbot --template-body file://rag-chatbot-cfn.yaml</span>
</code></pre>
]]></content:encoded></item><item><title><![CDATA[The DevOps Roadmap: A Guide to Becoming a DevOps Engineer Professional]]></title><description><![CDATA[DevOps is a cultural and collaborative mindset that emphasizes communication, collaboration, integration, and automation between development and operations teams to achieve faster and more reliable software delivery. DevOps engineers are professional...]]></description><link>https://blog.omprakashthakur.com.np/the-devops-roadmap-a-guide-to-becoming-a-devops-engineer-professional</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/the-devops-roadmap-a-guide-to-becoming-a-devops-engineer-professional</guid><category><![CDATA[Devops]]></category><category><![CDATA[aws devops]]></category><category><![CDATA[azure-devops]]></category><category><![CDATA[Devops articles]]></category><category><![CDATA[#Devopscommunity]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Wed, 10 Jan 2024 04:25:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1704859960576/6de36e6c-3c19-4e80-8b90-b485366f2dc3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DevOps is a cultural and collaborative mindset that emphasizes communication, collaboration, integration, and automation between development and operations teams to achieve faster and more reliable software delivery. DevOps engineers are professionals with the skills and knowledge to work across the entire software creation and maintenance process, from development to operations, encompassing the entire technology stack.</p>
<p>But how can you become a DevOps engineer? What are the steps and skills you need to learn and master? In this article, we will provide you with a DevOps roadmap, which is a visual guide that shows the main steps and concepts you need to follow and understand to become a successful DevOps engineer.</p>
<h2 id="heading-the-devops-roadmap"><strong>The DevOps Roadmap</strong></h2>
<p>The DevOps roadmap below covers a lot of topics within software development. You don't need to learn everything at once, but you should have a general idea of what each topic entails and how it relates to DevOps. You can also use this roadmap as a reference to dive deeper into the topics that interest you or that you need to improve on.</p>
<h3 id="heading-devops-career-roadmap-steps"><strong>DevOps Career Roadmap Steps</strong></h3>
<ol>
<li><p>Learn programming languages.</p>
</li>
<li><p>Study operating systems.</p>
</li>
<li><p>Review networking security and protocols.</p>
</li>
<li><p>Understand Infrastructure as Code.</p>
</li>
<li><p>Adopt Continuous Integration/Continuous Deployment tools.</p>
</li>
<li><p>Invest in application and infrastructure monitoring.</p>
</li>
<li><p>Study cloud providers.</p>
</li>
<li><p>Learn cloud design patterns.</p>
</li>
</ol>
<p>Let's break down each of these steps in more detail.</p>
<h3 id="heading-1-learn-programming-languages"><strong>1. Learn programming languages.</strong></h3>
<p>Although DevOps engineers do not typically write source code, they do integrate databases, debug code from the development team, and automate processes. Automation is a critical part of what gives the DevOps lifecycle its speed, and a DevOps engineer plays an important role in implementing a DevOps automation strategy.</p>
<p>Additionally, a DevOps engineer should have a working knowledge of the languages their team is using to help them understand existing code, review new code, and assist with debugging.</p>
<p>Programming languages to learn include:</p>
<ul>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1704823774306/d98db40d-c08f-4fc6-aa4b-7ae523663cb3.png" alt class="image--center mx-auto" /></p>
<p>  Go (recommended)</p>
</li>
<li><p>Ruby</p>
</li>
<li><p>Python</p>
</li>
<li><p>Node.js</p>
</li>
</ul>
<h3 id="heading-2-study-operating-systems"><strong>2. Study operating systems.</strong></h3>
<p>Operating systems (OSs) are a crucial piece of the technology stack that a DevOps team needs to function. OSs not only power the local machines that the team uses to communicate and complete tasks, but they also run the servers that host the team's deployed applications.</p>
<p>As such, you need to learn the command line terminal so you are not reliant on the graphic user interface (GUI) to configure your servers. Command line simplifies tasks that would require multiple clicks in a GUI, and some commands are only executable through the terminal.</p>
<p>Every OS is different, so learning more than one is advisable. Popular OSs to learn include:</p>
<ul>
<li><p>Linux (recommended)</p>
</li>
<li><p>Unix</p>
</li>
<li><p>Windows</p>
</li>
</ul>
<p>You'll also want to learn the larger strategies and rules that govern how OSs are built and run. As a DevOps engineer, technical knowledge and conceptual knowledge are equally important.</p>
<p>Some of the topics you should learn about operating systems include:</p>
<ul>
<li><p>Processer</p>
</li>
<li><p>Memory/Storage</p>
</li>
<li><p>I/O Management</p>
</li>
<li><p>Virtualization</p>
</li>
<li><p>File Systems</p>
</li>
<li><p>Startup Management (initd)</p>
</li>
<li><p>Service Management (systemd)</p>
</li>
<li><p>Threads and Concurrency</p>
</li>
</ul>
<h3 id="heading-3-review-networking-security-and-protocols"><strong>3. Review networking security and protocols.</strong></h3>
<p>Networking is another essential aspect of the technology stack that a DevOps team relies on. Networking enables communication between different devices, applications, and services within and outside the organization.</p>
<p>As a DevOps engineer, you need to understand how networking works, how to troubleshoot network issues, how to secure network connections, and how to optimize network performance.</p>
<p>Some of the topics you should learn about networking security and protocols include:</p>
<ul>
<li><p>OSI Model</p>
</li>
<li><p>HTTP</p>
</li>
<li><p>HTTPS</p>
</li>
<li><p>FTP/SFTP</p>
</li>
<li><p>SSL/TLS</p>
</li>
<li><p>SSH</p>
</li>
<li><p>Port Forwarding</p>
</li>
<li><p>DNS</p>
</li>
<li><p>Emails</p>
</li>
<li><p>SMTP:</p>
</li>
</ul>
<p><code>- IMAPS</code></p>
<p><code>- POP3S</code></p>
<p><code>- DMARC</code></p>
<p><code>- SPF</code></p>
<p><code>- Domain Keys</code></p>
<p><code>- White/Grey Listing</code></p>
<p>You should also learn about different types of network tools and services that can help you manage your network infrastructure, such as:</p>
<ul>
<li><p>Forward Proxy</p>
</li>
<li><p>Caching Server</p>
</li>
<li><p>Reverse Proxy</p>
</li>
<li><p>Load Balancer</p>
</li>
<li><p>Firewall</p>
</li>
<li><p>Network Tools:</p>
</li>
</ul>
<p><code>- traceroute</code></p>
<p><code>- mtr</code></p>
<p><code>- ping</code></p>
<p><code>- tcpdump</code></p>
<p><code>- netstat</code></p>
<p><code>- dig</code></p>
<p><code>- scp</code></p>
<p><code>- iptables/nftables</code></p>
<p><code>- ufw/firewalld</code></p>
<p><code>- nmap</code></p>
<h3 id="heading-4-understand-infrastructure-as-code"><strong>4. Understand Infrastructure as Code.</strong></h3>
<p>Infrastructure as Code (IaC) is a key DevOps practice that enables you to automate the provisioning and management of your IT infrastructure using code. Instead of manually configuring and updating servers, networks, storage, and other infrastructure elements, you can use a high-level descriptive language to define the desired state of your infrastructure and let a tool like Azure Resource Manager (ARM), Terraform, or AWS, Azure CLI execute it for you.</p>
<p>IaC has many benefits for DevOps teams, such as:</p>
<ul>
<li><p>Faster and more reliable deployments: You can provision infrastructure on demand in minutes instead of hours or days, and ensure that every environment is consistent and reproducible.</p>
</li>
<li><p>Improved scalability and elasticity: You can easily scale up or down your infrastructure based on your application's needs, and pay only for what you use.</p>
</li>
<li><p>Enhanced security and compliance: You can enforce security policies and best practices across your infrastructure, and track changes and audit logs for compliance purposes.</p>
</li>
<li><p>Reduced costs and risks: You can avoid human errors and configuration drift that can lead to downtime, performance issues, or security breaches.</p>
</li>
</ul>
<p>Some of the topics you should learn about IaC include:</p>
<ul>
<li><p>IaC tools and frameworks: Learn how to use tools like ARM, Terraform, or AWS, Azure CLI to define and deploy your infrastructure as code. Each tool has its own syntax, features, and advantages.</p>
</li>
<li><p>IaC principles and best practices: Learn how to write clean, modular, reusable, and maintainable code for your infrastructure. Follow the DRY (Don't Repeat Yourself) principle, use version control, test your code, document your code, etc.</p>
</li>
<li><p>IaC patterns and architectures: Learn how to design your infrastructure to support different scenarios and requirements, such as high availability, disaster recovery, load balancing, etc. Use cloud design patterns to optimize your infrastructure for performance, scalability, security, etc.</p>
</li>
</ul>
<h3 id="heading-5-adopt-continuous-integrationcontinuous-delivery-cicd-tools"><strong>5. Adopt Continuous Integration/Continuous Delivery (CI/CD) tools.</strong></h3>
<p>Continuous Integration/Continuous Delivery (CI/CD) is another core DevOps practice that enables you to automate the process of building, testing, and deploying your software applications. CI/CD helps you deliver software faster and more frequently, while ensuring quality and reliability.</p>
<p>CI/CD consists of two main stages:</p>
<ul>
<li><p>Continuous Integration (CI): This is the process of merging code changes from multiple developers into a shared repository (such as GitHub) and running automated tests to verify that the code works as expected. CI helps you detect bugs early, improve code quality, and reduce integration conflicts.</p>
</li>
<li><p>Continuous Delivery (CD): This is the process of delivering code changes from the repository to different environments (such as development, testing, staging, or production) using automated pipelines. CD helps you deploy software faster and more consistently, while minimizing human errors and manual interventions.</p>
</li>
</ul>
<p>Some of the topics you should learn about CI/CD include:</p>
<ul>
<li><p>CI/CD tools and platforms: Learn how to use tools like Jenkins, GitLab CI, Travis CI, GitHub Actions, TeamCity, Circle CI, Drone, Azure DevOps Services, AWS DevOps Services etc. to create and manage your CI/CD pipelines. Each tool has its own features and capabilities.</p>
</li>
<li><p>CI/CD principles and best practices: Learn how to implement CI/CD effectively in your DevOps workflow. Follow the principles of frequent integration and fast feedback loops.</p>
</li>
</ul>
<h3 id="heading-6-invest-in-application-and-infrastructure-monitoring"><strong>6. Invest in application and infrastructure monitoring.</strong></h3>
<p>Application and infrastructure monitoring is the process of collecting and analyzing data from your software applications and backend components to measure their performance, health, availability, and user experience. Monitoring helps you detect and troubleshoot issues, optimize resource utilization, improve service quality, and ensure customer satisfaction.</p>
<p>Application monitoring tracks metrics such as response time, error rate, throughput, and user satisfaction from your web or mobile applications. You can use tools like Real User Monitoring (RUM) or Synthetic Monitoring to measure how your applications perform from the end-user perspective. You can also use tools like Application Performance Monitoring (APM) or Distributed Tracing to measure how your applications perform internally, such as how they interact with microservices, databases, or APIs.</p>
<p>Infrastructure monitoring tracks metrics such as CPU utilization, memory usage, disk I/O, network traffic, and uptime from your servers, virtual machines, containers, databases, and other backend components. You can use tools like Datadog, Amazon CloudWatch, Azure Monitor, or IBM Cloud Monitoring to collect and visualize infrastructure metrics from various sources.</p>
<p>Application and infrastructure monitoring are complementary practices that provide you with a holistic view of your system's performance and reliability. By correlating application and infrastructure metrics, you can identify the root cause of issues faster and more accurately.</p>
<p>Some of the topics you should learn about application and infrastructure monitoring include:</p>
<ul>
<li><p>Monitoring tools and platforms: Learn how to use tools like Datadog, Amazon CloudWatch, Azure Monitor, IBM Cloud Monitoring, New Relic, AppDynamics, Instance, etc. to collect and visualize application and infrastructure metrics from various sources. Each tool has its own features and capabilities.</p>
</li>
<li><p>Monitoring principles and best practices: Learn how to implement monitoring effectively in your DevOps workflow. Follow the principles of observability (the ability to infer the internal state of a system from its external outputs), the four golden signals (latency, traffic, errors, saturation), the RED method (request rate, error rate, duration), the USE method (utilization, saturation, errors), etc.</p>
</li>
<li><p>Monitoring patterns and architectures: Learn how to design your monitoring system to support different scenarios and requirements, such as high availability, scalability, security, etc. Use cloud design patterns to optimize your monitoring system for performance, cost-efficiency, reliability, etc.</p>
</li>
</ul>
<h3 id="heading-7-study-cloud-providers"><strong>7. Study cloud providers.</strong></h3>
<p>Cloud providers are companies that offer cloud computing services such as infrastructure as a service (IaaS), platform as a service (PaaS), software as a service (SaaS), etc. Cloud computing enables you to access computing resources on demand over the internet without having to manage them yourself.</p>
<p>As a DevOps engineer, you need to understand how cloud providers work, what services they offer, how to use them efficiently and securely, and how to integrate them with your DevOps tools and processes.</p>
<p>Some of the popular cloud providers you should learn about include:</p>
<ul>
<li>AWS</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1704820905664/1ca7abb9-eeb6-42b5-8daa-93436b7da5d2.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Google Cloud</p>
</li>
<li><p>Azure</p>
</li>
<li><p>Digital Ocean</p>
</li>
<li><p>Heroku</p>
</li>
<li><p>Linode</p>
</li>
<li><p>Vultr</p>
</li>
<li><p>Alibaba Cloud</p>
</li>
</ul>
<p>Each cloud provider has its own advantages and disadvantages in terms of features, pricing, reliability, scalability, security, etc. You should compare and contrast different cloud providers based on your application's needs and preferences.</p>
<p>Some of the topics you should learn about cloud providers include:</p>
<ul>
<li><p>Cloud computing concepts and models: Learn the basic concepts and terminology of cloud computing, such as cloud service models (IaaS, PaaS, SaaS), cloud deployment models (public, private, hybrid, multi-cloud), cloud characteristics (on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service), etc.</p>
</li>
<li><p>Cloud provider services and features: Learn the different types of services and features that each cloud provider offers, such as compute, storage, networking, database, analytics, security, management, etc. Learn how to use these services and features to build and run your applications in the cloud.</p>
</li>
<li><p>Cloud provider tools and platforms: Learn how to use the tools and platforms that each cloud provider provides to manage and monitor your cloud resources and applications, such as AWS Console, Google Cloud Console, Azure Portal, AWS CLI, Google Cloud SDK, Azure CLI, AWS CloudFormation, Google Cloud Deployment Manager, Azure Resource Manager, etc.</p>
</li>
<li><p>Cloud provider best practices and recommendations: Learn how to follow the best practices and recommendations that each cloud provider suggests to optimize your cloud usage and performance, such as security best practices, cost optimization best practices, performance optimization best practices, reliability best practices, etc.</p>
</li>
</ul>
<h3 id="heading-8-learn-cloud-design-patterns"><strong>8. Learn cloud design patterns.</strong></h3>
<p>Cloud design patterns are general solutions to common problems or challenges that arise when designing and developing applications in the cloud. Cloud design patterns provide guidance and best practices on how to use cloud services and features effectively and efficiently.</p>
<p>As a DevOps engineer, you need to learn how to apply cloud design patterns to your application architecture and infrastructure design. Cloud design patterns can help you improve your application's performance, scalability, reliability, security, availability, etc.</p>
<p>Some of the common cloud design patterns you should learn about include:</p>
<ul>
<li><p>Availability patterns: These patterns help you ensure that your application is always available and responsive to user requests. Examples of availability patterns are Health Endpoint Monitoring (monitoring the health of an application using a specific URL endpoint), Queue-Based Load Leveling (using a queue to distribute workloads evenly across multiple instances), Throttling (limiting the number of requests that an application can accept or process), etc.</p>
</li>
<li><p>Data management patterns: These patterns help you manage your data effectively and efficiently in the cloud. Examples of data management patterns are CQRS (separating read and write operations for a data store), Event Sourcing (capturing changes to an application state as a sequence of events), Sharding (partitioning data across multiple data stores), etc.</p>
</li>
<li><p>Design and implementation patterns: These patterns help you design and implement your application logic and functionality in the cloud. Examples of design and implementation patterns are Microservices (decomposing an application into small independent services), Serverless (using cloud functions to execute code without managing servers), Strangler (gradually replacing a legacy system with a new system), etc.</p>
</li>
<li><p>Management and monitoring patterns: These patterns help you manage and monitor your cloud resources and applications. Examples of management and monitoring patterns are Autoscaling (adjusting the number of instances or resources based on demand), Circuit Breaker (handling failures and preventing cascading failures), Compensating Transaction (undoing the effects of a previous operation), etc.</p>
</li>
<li><p>Performance and scalability patterns: These patterns help you improve your application's performance and scalability in the cloud. Examples of performance and scalability patterns are Cache-Aside (loading data on demand into a cache from a data store), CDN (using a distributed network of servers to deliver content to users), Load Balancer (distributing incoming requests across multiple instances or resources), etc.</p>
</li>
<li><p>Resiliency patterns: These patterns help you improve your application's resiliency and fault tolerance in the cloud. Examples of resiliency patterns are Bulkhead (isolating elements of an application to prevent failures from spreading), Leader Election (coordinating the actions of multiple instances of a service), Retry (repeating an operation that failed due to transient errors), etc.</p>
</li>
<li><p>Security patterns: These patterns help you improve your application's security and compliance in the cloud. Examples of security patterns are Federated Identity (delegating user authentication to an external identity provider), Role-Based Access Control (granting access to resources based on roles and permissions), Valet Key (using a token or key to grant limited access to resources), etc.</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Becoming a DevOps engineer is not an easy task, but it is a rewarding and fulfilling career path. By following this DevOps roadmap, you can learn the essential skills and concepts that will help you succeed in this role.</p>
<p>Remember that this roadmap is not a definitive or exhaustive guide, but rather a starting point for your learning journey. You should always keep learning and updating your knowledge as new technologies and practices emerge in the DevOps field.</p>
<p>I hope this article has given you some useful insights and resources to help you become a DevOps engineer. If you have any questions or feedback, please feel free to contact me.</p>
]]></content:encoded></item><item><title><![CDATA[The AWS Well-Architected Framework: 
6 pillars of successful architectures.]]></title><description><![CDATA[Amazon Web Services (AWS) is currently the world’s leading cloud platform, with over 1 million active users in 190+ countries. and consistent year-over-year growth rates of more than 30 percent.
To help architects and developers learn and implement b...]]></description><link>https://blog.omprakashthakur.com.np/the-aws-well-architected-framework-6-pillars-of-successful-architectures</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/the-aws-well-architected-framework-6-pillars-of-successful-architectures</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS Cloud Practitioner]]></category><category><![CDATA[aws _six_pillars_waf]]></category><category><![CDATA[omthakur]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Tue, 11 Jul 2023 04:54:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1689055629217/4b45358a-f3de-40e7-9b83-b64c5acf4d0b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Amazon Web Services (AWS) is currently the world’s leading cloud platform, with over <strong>1 million</strong> active users in 190+ countries. and consistent year-over-year growth rates of more than 30 percent.</p>
<p>To help architects and developers learn and implement best practices in building systems on AWS, Amazon introduced the Well-Architected Framework in 2012.<br />At the heart of the Well-Architected Framework are the six pillars, which form a foundation for systems built on AWS. As Amazon puts it, “Incorporating these pillars into your architecture will help you produce stable and efficient systems. This will allow you to focus on the other aspects of design, such as functional requirements.”<br />In this article, I’ll review the six pillars of the AWS Well-Architected Framework and offer a brief explanation of each.</p>
<p><strong>Pillar 1: Operational Excellence</strong></p>
<p>In the Operational Excellence pillar, developers will find an overview of design principles and best practices in the areas of organization, preparation, operation, and evolution. This pillar encompasses the ability to<br />• Support the development and run workloads effectively<br />• Gain insight into their operations<br />• Continuously improve supporting processes and procedures to deliver business value</p>
<p><strong>Pillar 2: Security</strong><br />The Security pillar focuses on best practices in the areas of security foundations, identity and access management, detection, infrastructure protection, data protection, and incident response. Developers will understand how to control user permissions, recognize security incidents, safeguard systems and services, and implement data protection measures.</p>
<p><strong>Pillar 3: Reliability</strong><br />In the Reliability pillar, we learn that the primary key to the reliability of a workload in the cloud is resiliency: the ability to recover from disruptions, dynamically acquire resources to meet demand and mitigate issues such as misconfigurations. The other two key reliability factors are:<br />• Availability: The workload’s ability to successfully perform its function when needed<br />• Disaster Recovery (DR) objectives: Strategies for recovering the workload in case of a natural disaster, a large-scale technical failure, or a deliberate attack</p>
<p><strong>Pillar 4: Performance Efficiency</strong><br />The Performance Efficiency pillar is all about taking a data-driven approach to building a successful architecture in AWS. It encompasses the efficient use of computing resources to meet system requirements and the maintenance of that efficiency amid changes in demand and technologies. Performance Efficiency covers best practices in the areas of selection, review, monitoring, and tradeoffs.</p>
<p><strong>Pillar 5: Cost Optimization</strong><br />In the Cost Optimization pillar, we focus on our ability to run systems in a way that delivers business value at the lowest possible price point. As with the other pillars, we must often consider tradeoffs of one benefit versus another, e.g. speed-to-market versus up-front cost minimization. Cost Optimization encompasses best practices in five areas:</p>
<p>• Practice Cloud Financial Management<br />• Expenditure and usage awareness<br />• Use cost-effective resources<br />• Manage demand and supply resources<br />• Optimize over time</p>
<p><strong>Pillar 6: Sustainability</strong><br />The focal point of the Sustainability pillar is minimizing environmental impact, particularly in terms of energy consumption and efficiency. The goal here is to achieve maximum benefit from the resources provisioned while also minimizing the total resources required. This effort can encompass, for example,<br />• Selecting efficient programming languages<br />• Adopting modern algorithms<br />• Using efficient approaches to data storage<br />• Deploying to appropriately sized and efficient infrastructures<br />• Minimizing requirements for high-powered end-user hardware.</p>
<p><a target="_blank" href="https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/">The AWS Well-Architected Framework, I highly recommend exploring the wide array of resources available on Amazon’s dedicated website</a></p>
]]></content:encoded></item><item><title><![CDATA[Choose between Amazon RDS and AWS EC2.]]></title><description><![CDATA[The choice between a database on an EC2 instance and RDS is essentially the choice between an unmanaged environment where the burden is on you to manage everything yourself and a managed service where the cloud vendor shoulders the burden of mundane ...]]></description><link>https://blog.omprakashthakur.com.np/choose-between-amazon-rds-and-aws-ec2</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/choose-between-amazon-rds-and-aws-ec2</guid><category><![CDATA[AWS RDS]]></category><category><![CDATA[ec2]]></category><category><![CDATA[Databases]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Mon, 26 Jun 2023 10:08:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1687773977113/12d06e4c-b529-45e5-8e76-88b685cc4f70.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The choice between a database on an EC2 instance and RDS is essentially the choice between an unmanaged environment where the burden is on you to manage everything yourself and a managed service where the cloud vendor shoulders the burden of mundane management tasks. A simple API call gives you control over deployment, backups, snapshots, restores, sizing, high availability, and replicas. In contrast, the self-managed database on the EC2 option requires you to manually set up, configure, manage, and tune the various components, including Amazon EC2 instances, storage volumes, scalability, networking, and security.</p>
]]></content:encoded></item><item><title><![CDATA[Six Methods to Integration Can Improve Your Cloud Services]]></title><description><![CDATA[The success of a software-as-a-service (SaaS) product depends on several different factors, including time to market, functionality and ease of use, and customer service. One of the most important things in enabling those successes, however, concerns...]]></description><link>https://blog.omprakashthakur.com.np/six-methods-to-integration-can-improve-your-cloud-services</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/six-methods-to-integration-can-improve-your-cloud-services</guid><category><![CDATA[AWS]]></category><category><![CDATA[Amazon Web Services]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Mon, 19 Jun 2023 17:25:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1687195145068/9799d484-ba05-4c6f-81df-97a1f02f5a26.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The success of a software-as-a-service (SaaS) product depends on several different factors, including time to market, functionality and ease of use, and customer service. One of the most important things in enabling those successes, however, concerns integration and whether the SaaS tool can connect customer and partner systems and integrate the data that’s powering the solution behind the scenes.</p>
<p>There are many backend data management challenges that a SaaS enterprise must face when it comes to delivering its cloud services. Limitations integrating legacy systems and custom scripting hinder SaaS companies’ ability to deliver expanded value and full solution potential for their customers. Such integrations can prove to be very expensive, as many cloud-service companies attempt to custom-build and manage them themselves, which in turn leads to costly and time-consuming headaches as the company grows.</p>
<p><strong><em>Expansive Connectivity for Cloud and On-Premise Systems</em></strong></p>
<p>An expansive integration platform provides all the connectors and protocols that your customers and backend systems require to successfully integrate any new trading partner and application within an enterprise’s environment.</p>
<p><strong><em>Self-Service Capabilities for End Users</em></strong></p>
<p>Self-service tools and real-time visibility provide unrivaled simplicity and intelligence in ecosystem-driven integration scenarios. By spanning all modern integration use cases, an advanced integration platform centralizes the governance of partner, supplier, and customer interactions for frictionless business process orchestration.</p>
<p><strong><em>Single, Scalable Platform for Every Data Interaction</em></strong></p>
<p>The flexibility gained from a single integration platform enables enterprises to better handle digital transformation initiatives while introducing new technologies to make it easier to connect and securely exchange information with any new ecosystem partner.</p>
<p><strong>Database Independence for Advanced Scalability.</strong></p>
<p>A centralized platform is DevOps-enabled, which allows quick spin-up, spin-down, flexible licensing, database independence, and full support for immutable infrastructure patterns common in SaaS environments.</p>
<p><strong><em>REST API Support for Flexible Interfacing</em></strong></p>
<p>Data integration solutions must support REST API connectivity, which supports modern data movement requirements and the “headless” strategy for data transformation.</p>
<p><strong><em>Enhanced Data Visibility Across the Entire Ecosystem</em></strong></p>
<p>As a business, visibility into your data is not only required in many cases, it’s also extremely empowering. Knowing the state of your revenue-generating processes can be the difference between success and failure in a highly competitive business environment.</p>
<h3 id="heading-conclusion"><strong>Conclusion</strong></h3>
<p>A modern integration solution should free up your architects and development, DevOps, and support teams so they can concentrate on building a high-value SaaS solution without having to worry about the data services infrastructure. Your business can have confidence that its integration solution will scale to support growing customer needs, fit seamlessly into the SaaS environment, and provide all the external and internal integration and connectivity you demand.</p>
]]></content:encoded></item><item><title><![CDATA[AWS DevOps]]></title><description><![CDATA[AWS Cloud DevOps encompasses a broad range of practices and technologies related to the development and operation of applications in the cloud using Amazon Web Services (AWS). The scope of AWS Cloud DevOps typically includes the following areas:

Inf...]]></description><link>https://blog.omprakashthakur.com.np/aws-devops</link><guid isPermaLink="true">https://blog.omprakashthakur.com.np/aws-devops</guid><category><![CDATA[aws devops]]></category><dc:creator><![CDATA[Om Thakur]]></dc:creator><pubDate>Mon, 22 May 2023 10:14:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1684750336476/47d7dc86-6272-4896-8ea9-fc61b80adb4e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS Cloud DevOps encompasses a broad range of practices and technologies related to the development and operation of applications in the cloud using Amazon Web Services (AWS). The scope of AWS Cloud DevOps typically includes the following areas:</p>
<ol>
<li><p>Infrastructure as Code (IaC): DevOps teams use tools like AWS CloudFormation or AWS CDK to define and manage their infrastructure resources in a declarative manner. Infrastructure is treated as code, allowing for versioning, automation, and reproducibility.</p>
</li>
<li><p>Continuous Integration and Continuous Delivery (CI/CD): CI/CD pipelines automate the build, test, and deployment processes, enabling teams to rapidly and reliably deliver applications. AWS offers services like AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy to facilitate CI/CD workflows.</p>
</li>
<li><p>Configuration Management: Tools such as AWS Systems Manager and AWS OpsWorks enable the management and automation of configurations across AWS resources. Configuration management ensures consistency and enables efficient scaling and management of applications.</p>
</li>
<li><p>Monitoring and Logging: AWS provides services like AWS CloudWatch and AWS CloudTrail to monitor and log various aspects of your applications and infrastructure. Monitoring helps ensure performance, availability, and security, while logging enables auditing, troubleshooting, and analysis.</p>
</li>
<li><p>Scalability and Auto Scaling: AWS offers features like Auto Scaling, Elastic Load Balancing, and serverless computing (e.g., AWS Lambda) to help scale applications based on demand. These features ensure that resources can be dynamically provisioned or de-provisioned to meet workload fluctuations.</p>
</li>
<li><p>Security and Compliance: AWS provides numerous security features and services to protect applications and data. DevOps teams need to understand and implement security best practices, encryption mechanisms, access controls, and compliance standards relevant to their applications.</p>
</li>
<li><p>Disaster Recovery and High Availability: AWS offers services such as AWS Backup, AWS Disaster Recovery, and multi-region deployments to ensure business continuity and high availability of applications. DevOps teams should plan and implement strategies to recover from failures and minimize downtime.</p>
</li>
</ol>
<p>The demand for AWS Cloud DevOps professionals is high due to the growing adoption of cloud computing and the need for efficient and automated application delivery. Organizations are looking for individuals with expertise in AWS services, DevOps methodologies, and automation tools. The specific skills and knowledge sought after in AWS Cloud DevOps include:</p>
<ol>
<li><p>Proficiency in AWS services related to infrastructure provisioning, deployment, and management.</p>
</li>
<li><p>Experience with infrastructure as code tools like AWS CloudFormation, AWS CDK, or Terraform.</p>
</li>
<li><p>Knowledge of CI/CD tools and practices, such as AWS CodePipeline, Jenkins, or GitLab CI/CD.</p>
</li>
<li><p>Understanding of containerization technologies like Docker and container orchestration platforms like Amazon Elastic Kubernetes Service (EKS).</p>
</li>
<li><p>Familiarity with monitoring and logging tools like AWS CloudWatch, AWS X-Ray, or ELK Stack.</p>
</li>
<li><p>Expertise in scripting and automation using languages like Python, PowerShell, or Bash.</p>
</li>
<li><p>Understanding of security practices, identity and access management, and compliance frameworks in AWS.</p>
</li>
<li><p>Knowledge of networking concepts and experience with AWS networking services.</p>
</li>
<li><p>Strong problem-solving and troubleshooting skills to resolve issues related to application deployment and operations.</p>
</li>
<li><p>Familiarity with Agile and DevOps methodologies, collaboration tools, and version control systems like Git.</p>
</li>
</ol>
<p>By acquiring these skills and keeping up with the latest trends and updates in AWS services, individuals can position themselves to meet the demands of the AWS Cloud DevOps market. Continuous learning and hands-on experience are crucial to stay competitive in this field.</p>
]]></content:encoded></item></channel></rss>