Developer SDK MIT

Guidance

Microsoft's constrained generation DSL that interleaves text templates with LLM generation. Control output structure with selects, regex, and grammar rules.

GitHub

Platforms: cross-platform

Guidance is a constrained generation library from Microsoft that provides a domain-specific language (DSL) for controlling LLM output structure. It lets developers interleave fixed text templates with constrained generation blocks, creating programs where the model fills in specific parts while the template guarantees overall structure. For developers who want fine-grained programmatic control over exactly what an LLM generates and where, Guidance offers the most expressive template-based approach to structured output.

Key Features

Template-based generation. Guidance programs look like Handlebars-style templates where fixed text is interspersed with generation directives. The model only generates within designated blocks, while everything else is fixed text. This gives precise control over output structure while letting the model contribute content where needed.

Select and constrain. The select primitive limits model output to a predefined set of options — ideal for classification, multiple choice, and decision routing. Combined with regex constraints and grammar rules, it enables complex structured outputs that are guaranteed valid.

Token healing. Guidance implements token healing to handle the boundary between template text and generated text correctly. This prevents the tokenization artifacts that occur when forcing specific prefixes, ensuring seamless transitions between fixed and generated content.

Stateful programs. Guidance programs maintain state across multiple generation steps. Capture the output of one generation block into a variable and use it to condition subsequent generation. This enables multi-step reasoning chains where each step builds on previous outputs.

Backend support. Guidance supports Hugging Face Transformers, llama.cpp, and remote APIs. For local models, it applies constraints at the token sampling level for guaranteed compliance. For API-based models, it uses the best available constrained generation features.

Speed optimizations. By knowing the template structure in advance, Guidance can pre-compute token sequences for fixed text portions and skip re-encoding, resulting in faster generation compared to approaches that treat the entire output as unconstrained.

When to Use Guidance

Choose Guidance when you need template-based control over LLM output with guaranteed structure. It excels for multi-step extraction tasks, structured reasoning chains, form filling, and any workflow where you need to alternate between fixed scaffolding and model-generated content.

Ecosystem Role

Guidance and Outlines are the two leading constrained generation libraries. Guidance takes a template-first approach where you write programs mixing text and generation, while Outlines takes a schema-first approach where you specify the output format. Guidance is more expressive for complex multi-step programs; Outlines is simpler for straightforward JSON extraction. Both work with local inference backends.