TS
Back to Blog
machine-learningllmpythoninternshipnlp

Fine-Tuning LLMs for Classification: Lessons from My AI Internship

What I learned fine-tuning language models for NER and classification tasks at WhizHack Technologies.

December 28, 20255 min read

The Challenge

During my AI research internship at WhizHack Technologies, I was tasked with improving our NLP pipeline for classification and Named Entity Recognition (NER). The existing models were accurate but too slow for production.

The Approach

1. Baseline Analysis

First, I profiled the existing pipeline:

  • Inference latency: ~200ms per request
  • Accuracy: 87% on our test set
  • Model size: 1.2GB

2. Fine-Tuning Strategy

Instead of using the full model, I experimented with:

python
# LoRA configuration for efficient fine-tuning
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)

3. Quantization

Applied 8-bit quantization to reduce memory footprint without significant accuracy loss.

Results

  • +5% accuracy (87% → 92%)
  • -8% latency (200ms → 184ms)
  • 40% smaller model size

Key Takeaways

  1. Data quality > model size: Cleaning our training data gave bigger gains than using larger models
  2. Profile first: Don't optimize blindly. Find the actual bottlenecks
  3. LoRA is magic: Fine-tuning with LoRA is fast and effective for domain adaptation

This experience shaped how I think about ML systems—it's not just about accuracy, but about building systems that work in production.

More posts