Local AI Tools Directory
Every major tool in the local AI ecosystem — reviewed, compared, and organized. We cover the full landscape, not just our own projects.
45 tools and counting
Inference Engines
Run models on your hardware
ExLlamaV2
Fastest inference engine for consumer NVIDIA GPUs. Custom CUDA kernels and EXL2 quantization format deliver maximum tokens per second on desktop hardware.
llama-cpp-python
Python bindings for llama.cpp providing a high-level API and OpenAI-compatible server. The easiest way to use llama.cpp from Python applications.
llama.cpp
The foundational C/C++ inference engine that pioneered consumer-hardware LLM deployment via quantization. Powers Ollama, LM Studio, GPT4All, and KoboldCpp.
MLX
Apple's machine learning framework for Apple Silicon. Leverages unified memory architecture for efficient LLM inference on Mac with minimal data copying.
Mullama
CognisocVersatile local LLM inference engine with multi-language bindings for Python, Node.js, Go, Rust, PHP, and C/C++. Supports daemon server and embedded modes.
Ollama
Single-binary LLM runner with built-in model registry, automatic GPU detection, and OpenAI-compatible REST API. The easiest way to run AI locally.
TensorRT-LLM
NVIDIA's high-performance LLM inference library. Achieves the highest throughput on NVIDIA GPUs with custom CUDA kernels, quantization, and in-flight batching.
vLLM
High-throughput LLM serving engine with PagedAttention, continuous batching, and tensor parallelism. Designed for multi-user production serving at scale.
Desktop Apps
GUI applications for local AI
GPT4All
Free desktop chatbot by Nomic AI that runs LLMs on consumer CPUs. Features LocalDocs for private document Q&A with no GPU required.
Jan
Open-source ChatGPT alternative that runs 100% offline. Clean desktop app with extension ecosystem, local API server, and cross-platform support.
KoboldCpp
Single-file portable LLM runner with built-in chat UI, story mode, Whisper speech-to-text, and TTS. Optimized for creative writing and roleplay.
LM Studio
The most comprehensive local LLM desktop application. Discover, download, and chat with models through a polished UI with built-in OpenAI-compatible API server.
Msty
Clean, modern desktop AI chat application with split-screen model comparison, offline mode, and support for local and remote LLM providers.
Text Generation WebUI
Feature-rich Gradio web interface by oobabooga supporting multiple inference backends including llama.cpp, ExLlamaV2, Transformers, and AutoGPTQ.
Web Interfaces
Self-hosted chat platforms
AnythingLLM
All-in-one AI application with workspace-based RAG, document ingestion, built-in vector database, and multi-user support. Chat with your documents locally.
LibreChat
Multi-provider AI chat platform with MCP support, AI Agents, plugins, and multi-user auth. Self-hosted alternative to ChatGPT with enterprise features.
Open WebUI
Self-hosted ChatGPT-like interface with 130K+ GitHub stars. Clean design, model selector, markdown rendering, plugin ecosystem, and multi-user authentication.
SillyTavern
Advanced roleplay and storytelling chat frontend with Visual Novel mode, character cards, world-building tools, and extensive customization for creative AI interaction.
Server / API
Serve models to applications
LocalAI
OpenAI API drop-in replacement that runs LLMs, image generation, audio transcription, and embedding models locally. No GPU required, fully self-hosted.
Text Generation Inference (TGI)
Hugging Face's production-grade inference server for LLMs. Optimized for throughput with continuous batching, tensor parallelism, and Flash Attention.
Mobile AI
On-device AI for phones
Llamafu
CognisocFlutter plugin enabling on-device AI inference on Android and iOS with complete privacy. Supports text generation, chat, vision, function calling, and structured JSON output.
MLC LLM
Machine Learning Compilation framework for deploying LLMs on mobile devices, browsers, and edge hardware. Native iOS, Android, and WebGPU support.
Developer SDKs
Build AI applications
Guidance
Microsoft's constrained generation DSL that interleaves text templates with LLM generation. Control output structure with selects, regex, and grammar rules.
Haystack
Production-ready AI pipeline framework by deepset. Build composable RAG, question answering, and agent pipelines with modular components and any LLM backend.
LangChain
The dominant LLM application framework with 90K GitHub stars. Build chains, agents, and RAG pipelines with local models via Ollama, llama.cpp, or any OpenAI-compatible API.
LlamaIndex
Leading data framework for building RAG and agentic applications over private data. 30K+ GitHub stars, 300+ data connectors, and production-ready pipelines.
Outlines
Structured text generation library using finite-state machines to guarantee valid JSON, regex patterns, and grammar-conforming output from any LLM.
Semantic Kernel
Microsoft's open-source SDK for integrating LLMs into .NET, Python, and Java applications. Enterprise-focused with planners, plugins, and AI agent patterns.
Fine-Tuning
Train and customize models
Axolotl
Config-driven fine-tuning framework supporting LoRA, QLoRA, full fine-tuning, multi-GPU, FSDP2, and DeepSpeed. Simplifies training with YAML configuration.
LLaMA Factory
Web UI-driven fine-tuning framework supporting 100+ model architectures. One-click training with LoRA, QLoRA, RLHF, DPO, and comprehensive evaluation.
Unsloth
2x faster LLM training with 80% less memory via custom Triton attention kernels. Fine-tune 70B models with QLoRA on a single consumer GPU with 24GB VRAM.
Vector Databases
Store and search embeddings
ChromaDB
Lightweight, local-first open-source vector database. The default embedding store for RAG applications with simple Python/JS APIs and zero-config setup.
FAISS
Meta's C++ library for efficient similarity search and dense vector clustering. Industry standard for billion-scale nearest neighbor search with IVF and HNSW indexes.
pgvector
PostgreSQL extension for vector similarity search. Store embeddings alongside relational data in your existing Postgres database with HNSW and IVFFlat indexes.
Qdrant
High-performance vector database written in Rust. Production-grade similarity search with advanced filtering, multi-tenancy, and horizontal scaling.
Weaviate
Open-source vector database with hybrid vector and keyword search, GraphQL API, built-in vectorization modules, and multi-tenancy for production AI applications.
Voice & Audio
Speech-to-text and TTS
Kokoro TTS
Lightweight, high-quality text-to-speech model using ONNX runtime. Sub-second latency, natural prosody, and minimal resource usage for local voice synthesis.
Piper TTS
Fast, lightweight neural text-to-speech system that runs on CPU. Optimized for Raspberry Pi and edge devices with 30+ languages and natural-sounding voices.
Whisper
OpenAI's open-source speech-to-text model. Run locally via faster-whisper (CTranslate2) or Whisper.cpp for real-time transcription in 100+ languages.
Image & Vision
Image generation and understanding
AUTOMATIC1111 Stable Diffusion WebUI
The original Stable Diffusion web interface with the largest extension ecosystem. Feature-rich UI for text-to-image, img2img, inpainting, and more.
ComfyUI
Node-based visual workflow editor for Stable Diffusion and generative AI. Build complex image, video, and audio generation pipelines with drag-and-drop nodes.
Code Assistants
AI-powered coding tools
Aider
Terminal-based AI pair programming tool. Edit code in your repo through conversation with automatic git commits, multi-file editing, and broad LLM support.
Continue
Open-source AI code assistant for VS Code and JetBrains. Local Copilot alternative with tab autocomplete, chat, and inline editing powered by any LLM.
Tabby
Self-hosted AI coding assistant with team features. GitHub Copilot alternative with code completion, chat, repository-aware context, and admin dashboard.
Educational
Learn AI internals