Distill-R1 is an open-source project demonstrating knowledge distillation to create smaller, high-performing, domain-adapted LLMs from powerful proprietary teachers (Gemini, Claude). It uses synthetic data generation from teacher responses on enterprise-style prompts, then distills that knowledge into an open-source student model (Llama 3.1 8B or Mistral 7B) via LoRA/PEFT. The resulting model is quantized for local deployment via Ollama, delivering near-proprietary performance on internal tasks at dramatically lower cost and with full data privacy. The project includes full training scripts, evaluation suite, before/after comparisons, and local inference demo — a complete end-to-end MLOps showcase for LLM compression and adaptation.
Distill-R1 addresses a critical enterprise LLM challenge: proprietary models like Gemini and Claude deliver excellent performance but are expensive, closed, and introduce privacy risks when processing internal data. Distill-R1 enables organizations to "clone" teacher model capabilities into smaller open-source models tailored to their domain (support tickets, product Q&A, compliance queries). The distilled model runs locally with no data exfiltration, achieving 85–95% of teacher performance at 1/10th the inference cost. The project serves as both a practical distillation framework and a portfolio demonstration of advanced LLM engineering: synthetic data curation, distillation training, rigorous evaluation, and production-ready quantization.
Distill-R1 reduces LLM inference costs by 80–90% while maintaining high domain performance and eliminating vendor lock-in. Enterprises gain proprietary-grade intelligence on internal data without ongoing API spend or privacy exposure. Primary value drivers: cost optimization, data sovereignty, performance customization, and reduced dependency on closed models.
All training and inference occur locally. Synthetic data generation can be configured to avoid PII. Open-source code enables security review. No external API calls after teacher data collection phase (which can be air-gapped).
As an MLE, I want to distill Gemini-level performance into a local Llama model for my domain.
As an architect, I want before/after comparisons to justify switching from proprietary APIs.
As a DevSecOps lead, I want a quantized model that runs offline with no data leakage.
Enable enterprises to capture proprietary LLM performance in open, local models through knowledge distillation.
Core capabilities: synthetic data generation, distillation training, evaluation suite, quantization, local deployment. Success metrics: student accuracy vs teacher, inference cost reduction, latency on local hardware.
Input: prompt set + teacher responses. State tracks training progress, loss curves, evaluation results.
MVP with synthetic generation, LoRA distillation, basic evaluation, quantization, local demo.
PI-1: Synthetic data generation pipeline PI-2: LoRA distillation training PI-3: Evaluation suite and metrics PI-4: Quantization and local deployment PI-5: Comparison dashboard and polish
Open-source with contribution guidelines for new teacher/student combinations and domain templates.
Prompt curation → Teacher response generation → Student training → Evaluation → Quantization → Local deployment
Teacher generates soft labels → student trained to match probabilities → iterative improvement.
Models scored on weighted metrics (accuracy primary, latency/cost secondary). Top configurations recommended.
Hugging Face ecosystem for training, Ollama for inference. Local execution throughout.
Knowledge distillation with temperature-scaled soft labels and hard label balancing.
Training logs, loss curves, evaluation traces.
Distill-R1 follows rigorous MLOps practices in its model lifecycle:
Zero external calls after teacher data collection. All training local. Open code for audit.
Transparent training data provenance. No PII in synthetic prompts.
Local execution with checkpointing and resume capability.
Expected outcomes:
(A cheeky but honest roadmap of features we're deliberately not building in the MVP — because even distilled models can't fix infinite scope.)