machine-learningllmpythoninternshipnlp
Fine-Tuning LLMs for Classification: Lessons from My AI Internship
What I learned fine-tuning language models for NER and classification tasks at WhizHack Technologies.
December 28, 20255 min read
The Challenge
During my AI research internship at WhizHack Technologies, I was tasked with improving our NLP pipeline for classification and Named Entity Recognition (NER). The existing models were accurate but too slow for production.
The Approach
1. Baseline Analysis
First, I profiled the existing pipeline:
- Inference latency: ~200ms per request
- Accuracy: 87% on our test set
- Model size: 1.2GB
2. Fine-Tuning Strategy
Instead of using the full model, I experimented with:
python
# LoRA configuration for efficient fine-tuning
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
)3. Quantization
Applied 8-bit quantization to reduce memory footprint without significant accuracy loss.
Results
- +5% accuracy (87% → 92%)
- -8% latency (200ms → 184ms)
- 40% smaller model size
Key Takeaways
- Data quality > model size: Cleaning our training data gave bigger gains than using larger models
- Profile first: Don't optimize blindly. Find the actual bottlenecks
- LoRA is magic: Fine-tuning with LoRA is fast and effective for domain adaptation
This experience shaped how I think about ML systems—it's not just about accuracy, but about building systems that work in production.