Fine-Tuning Apache-2.0

Unsloth

2x faster LLM training with 80% less memory via custom Triton attention kernels. Fine-tune 70B models with QLoRA on a single consumer GPU with 24GB VRAM.

Platforms: linux

Unsloth is a fine-tuning framework that makes training and adapting large language models practical on consumer hardware. Through custom Triton kernels that replace standard attention and MLP computations, Unsloth achieves 2x faster training with 80% less memory than standard Hugging Face training loops. A 70B parameter model that normally requires a multi-GPU server can be fine-tuned with QLoRA on a single 24GB GPU using Unsloth.

Key Features

Custom Triton kernels. Unsloth rewrites the performance-critical operations — attention, cross-entropy loss, RoPE embeddings, RMSNorm — as fused Triton kernels. These eliminate redundant memory reads and writes between operations, dramatically reducing both VRAM consumption and training time compared to PyTorch’s default implementations.

QLoRA on consumer GPUs. Combine 4-bit quantized base models with low-rank adapters to fine-tune models that would otherwise require enterprise hardware. Unsloth’s optimized kernels work directly on quantized weights, so the memory savings of quantization and the speed improvements of custom kernels compound rather than conflict.

Broad model support. Unsloth supports fine-tuning Llama, Mistral, Phi, Qwen, Gemma, and other popular architectures. New model support typically arrives within days of a model’s release, keeping pace with the fast-moving open-weight ecosystem.

GGUF export pipeline. After training, export your fine-tuned model directly to GGUF format at any quantization level. The exported model works immediately with Ollama, llama.cpp, LM Studio, and any other GGUF-compatible tool. This end-to-end pipeline — from training data to local deployment — runs on a single machine.

Hugging Face integration. Unsloth integrates with the Hugging Face Transformers and TRL libraries. Existing training scripts often require only a few lines of change to swap in Unsloth’s optimized model loader. Datasets from Hugging Face Hub work directly with Unsloth’s training pipeline.

When to Use Unsloth

Use Unsloth when you want to fine-tune a language model on your own data and your hardware budget is a single consumer GPU rather than a cloud cluster. It is ideal for creating domain-specific models, instruction-tuning base models on custom datasets, and building specialized assistants that outperform general-purpose models on your specific tasks.

Ecosystem Role

Unsloth occupies the training and adaptation layer of the local AI stack. It takes pre-trained models as input and produces fine-tuned models as output. Those models then flow into inference tools — export to GGUF and serve with Ollama, or push to Hugging Face and load with vLLM. For inference only, use Ollama or llama.cpp. For training, Unsloth is the most accessible option on consumer hardware.