Local image generation lets you create AI art, product mockups, design concepts, and visual content entirely on your own hardware. No cloud services, no content filters, no per-image fees. This guide covers the two major model families (Stable Diffusion and FLUX), the most powerful interface (ComfyUI), and essential techniques like ControlNet, LoRAs, and prompt engineering that transform basic generation into a professional workflow.
Understanding the Models
Stable Diffusion Family
SD 1.5 (2022): The original. Lightweight, massive ecosystem of fine-tunes and LoRAs.
- Resolution: 512x512 native
- VRAM: 4+ GB
- Ecosystem: Thousands of community models and LoRAs
SDXL (2023): Significant quality upgrade. Better composition, lighting, and detail.
- Resolution: 1024x1024 native
- VRAM: 6-8+ GB
- Ecosystem: Growing rapidly, many SDXL-specific models
SD 3.5 (2024): Latest from Stability AI. Improved text rendering and coherence.
- Resolution: 1024x1024 native
- VRAM: 8-12+ GB
- Ecosystem: Still developing
FLUX Family
FLUX.1 Schnell (Fast): Optimized for speed. 4 steps instead of 20+.
- Resolution: 1024x1024
- VRAM: 8-12 GB
- Speed: 3-8 seconds on good hardware
FLUX.1 Dev: Higher quality, more steps.
- Resolution: Up to 2048x2048
- VRAM: 12-24 GB
- Speed: 15-30 seconds
- License: Non-commercial
FLUX.1 Pro: API-only commercial version.
Quick Comparison
| Feature | SD 1.5 | SDXL | FLUX.1 Dev |
|---|---|---|---|
| Quality | Good | Very good | Excellent |
| VRAM | 4 GB | 8 GB | 12-24 GB |
| Speed | Fast | Medium | Slower |
| Text in images | Poor | Fair | Good |
| Prompt adherence | Fair | Good | Excellent |
| LoRA ecosystem | Huge | Large | Growing |
| ControlNet | Yes | Yes | Yes |
| License | Open | Open | Non-commercial |
Setting Up ComfyUI
ComfyUI is a node-based interface for image generation. It’s more powerful than automatic1111’s WebUI, offering full control over the generation pipeline through visual workflows.
Installation
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install PyTorch (NVIDIA GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Install PyTorch (Apple Silicon)
pip install torch torchvision torchaudio
# Install PyTorch (CPU only)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install ComfyUI dependencies
pip install -r requirements.txt
# Start ComfyUI
python main.py
# Open http://127.0.0.1:8188 in your browser
Docker Installation
# NVIDIA GPU
docker run -d \
--gpus all \
-p 8188:8188 \
-v comfyui_data:/app/output \
-v comfyui_models:/app/models \
--name comfyui \
ghcr.io/ai-dock/comfyui:latest
ComfyUI Manager (Essential)
ComfyUI Manager adds a GUI for installing custom nodes, models, and extensions:
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# Restart ComfyUI
# Click "Manager" in the menu bar to access
Downloading Models
Stable Diffusion Models
Place checkpoint files in ComfyUI/models/checkpoints/:
cd ComfyUI/models/checkpoints
# SDXL Base (6.9 GB)
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
# Popular community models (from CivitAI or Hugging Face):
# - Juggernaut XL (photorealistic)
# - DreamShaper XL (versatile)
# - RealVisXL (photorealistic)
# - Pony Diffusion XL (stylized)
FLUX Models
cd ComfyUI/models
# FLUX.1 Dev (23.8 GB for full model)
# Use the fp8 quantized version to save VRAM:
cd unet
wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors
# FLUX.1 Schnell (fast version)
wget https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors
# FLUX text encoder (T5 XXL, required)
cd ../clip
wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/t5xxl_fp8_e4m3fn.safetensors
wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/clip_l.safetensors
# FLUX VAE
cd ../vae
wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/ae.safetensors
VAE Models
cd ComfyUI/models/vae
# SDXL VAE (for better color accuracy)
wget https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
Basic Image Generation Workflows
SDXL Workflow in ComfyUI
The default ComfyUI workflow generates images with SDXL. Here’s what each node does:
[Load Checkpoint] → loads the SDXL model
↓
[CLIP Text Encode (Positive)] → your prompt
↓
[CLIP Text Encode (Negative)] → what to avoid
↓
[KSampler] → the denoising/generation process
↓
[VAE Decode] → converts latent to image
↓
[Save Image] → saves to output folder
Key parameters in KSampler:
- Steps: 20-30 for SDXL (higher = more detail, slower)
- CFG (Classifier-Free Guidance): 7-8 for SDXL (higher = more prompt adherence)
- Sampler:
eulerordpmpp_2mfor speed,dpmpp_2m_sdefor quality - Scheduler:
karras(recommended) - Denoise: 1.0 for text-to-image, 0.3-0.7 for image-to-image
FLUX Workflow
FLUX uses a different pipeline than Stable Diffusion:
[Load Diffusion Model] → FLUX unet
[Load CLIP] → T5 encoder + CLIP-L
[Load VAE] → FLUX VAE
↓
[CLIP Text Encode] → prompt (no negative prompt needed for FLUX)
↓
[KSampler] → steps: 20-28, CFG: 1.0 (FLUX ignores CFG)
↓
[VAE Decode] → [Save Image]
FLUX-specific settings:
- Steps: 4 for Schnell, 20-28 for Dev
- CFG: 1.0 (FLUX uses guidance scale differently)
- Sampler:
eulerfor Schnell,eulerordpmpp_2mfor Dev - Scheduler:
simplefor Schnell,normalfor Dev
Prompt Engineering for Image Generation
SDXL Prompt Structure
# Positive prompt
A professional photograph of a mountain landscape at golden hour,
dramatic clouds, snow-capped peaks, alpine meadow with wildflowers,
crystal clear lake reflection, 8K resolution, photorealistic,
shot on Canon EOS R5, 24mm wide angle lens, f/11
# Negative prompt
blurry, low quality, distorted, deformed, ugly, bad anatomy,
watermark, text, signature, jpeg artifacts, low resolution
FLUX Prompt Style
FLUX responds better to natural language descriptions:
A cozy coffee shop on a rainy afternoon. Through the window,
you can see people with umbrellas on a wet cobblestone street.
Inside, warm lighting illuminates wooden tables, stacked books,
and a steaming cup of latte with leaf art. The atmosphere is
warm and inviting, with a slight film grain quality.
FLUX typically does not need negative prompts.
Prompt Tips
| Technique | Example | Effect |
|---|---|---|
| Quality modifiers | ”masterpiece, 8K, detailed” | Higher overall quality |
| Camera terms | ”shot on Canon, 85mm, f/1.8, bokeh” | Photographic style |
| Lighting | ”golden hour, dramatic lighting, rim light” | Mood and atmosphere |
| Style reference | ”in the style of Studio Ghibli” | Artistic direction |
| Composition | ”rule of thirds, leading lines, symmetrical” | Better framing |
| Specificity | ”red 1967 Ford Mustang” vs “a car” | More accurate results |
ControlNet
ControlNet lets you guide image generation using reference images for pose, edges, depth, or other structural information.
Installing ControlNet
# Install ControlNet nodes for ComfyUI
cd ComfyUI/custom_nodes
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
pip install -r comfyui_controlnet_aux/requirements.txt
# Download ControlNet models
cd ComfyUI/models/controlnet
# For SDXL
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.safetensors \
-O sdxl-controlnet-canny.safetensors
# For FLUX
wget https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny/resolve/main/diffusion_pytorch_model.safetensors \
-O flux-controlnet-canny.safetensors
ControlNet Types
| Type | Input | Use Case |
|---|---|---|
| Canny | Edge detection | Preserve structure/outlines |
| Depth | Depth map | Maintain 3D composition |
| OpenPose | Pose skeleton | Control human poses |
| Scribble | Hand-drawn lines | Quick sketches to images |
| Tile | Image tiles | Upscaling with detail |
| IP-Adapter | Reference image | Style/subject transfer |
| Inpainting | Masked region | Edit parts of images |
ControlNet Workflow
[Load Image] → [Canny Edge Detector] → canny_image
[Load ControlNet Model] → controlnet
↓
[Apply ControlNet] ← (conditioning from CLIP, controlnet, canny_image)
↓
[KSampler] → (uses controlled conditioning)
↓
[VAE Decode] → [Save Image]
LoRAs (Low-Rank Adaptations)
LoRAs are small add-on models that modify the base model’s output for specific styles, characters, or concepts.
Using LoRAs
# Place LoRA files in:
ComfyUI/models/loras/
# Popular LoRA sources:
# - CivitAI (civitai.com)
# - Hugging Face
In ComfyUI, add a “Load LoRA” node between the checkpoint and the CLIP encoders:
[Load Checkpoint] → [Load LoRA] → [CLIP Text Encode]
↑
lora_file: my_style.safetensors
strength_model: 0.7
strength_clip: 0.7
LoRA strength: 0.5-0.8 for subtle effect, 0.8-1.0 for strong effect. Too high can distort the image.
Stacking Multiple LoRAs
[Load Checkpoint] → [Load LoRA 1] → [Load LoRA 2] → [CLIP Encode]
style.safetensors character.safetensors
strength: 0.6 strength: 0.8
VRAM Management
Reducing VRAM Usage
# Start ComfyUI with optimizations
# Low VRAM mode (offloads to CPU)
python main.py --lowvram
# Very low VRAM mode (aggressive offloading)
python main.py --novram
# Use fp16 (half precision)
python main.py --force-fp16
# Apple Silicon
python main.py --force-fp16
# Specify GPU
python main.py --cuda-device 0
VRAM Requirements by Configuration
| Task | VRAM Needed | Notes |
|---|---|---|
| SD 1.5, 512x512 | 4 GB | Basic generation |
| SD 1.5 + ControlNet | 6 GB | With guidance |
| SDXL, 1024x1024 | 8 GB | Base generation |
| SDXL + ControlNet + LoRA | 10-12 GB | Full pipeline |
| FLUX Schnell (fp8) | 10 GB | Fast mode |
| FLUX Dev (fp8) | 12 GB | Quality mode |
| FLUX Dev (fp16) | 24 GB | Maximum quality |
| FLUX + ControlNet | 16-24 GB | With guidance |
Tiled Generation for High Resolution
Generate images larger than your VRAM allows using tiling:
# In ComfyUI, use the "Tiled KSampler" node
# This generates the image in overlapping tiles
# Allows 2048x2048+ on 8 GB VRAM
Image-to-Image and Inpainting
Image-to-Image
Transform an existing image using a prompt:
[Load Image] → [VAE Encode] → latent
↓
[KSampler] ← latent + conditioning
denoise: 0.5 (lower = closer to original)
↓
[VAE Decode] → [Save Image]
Denoise strength:
- 0.2-0.3: Subtle changes (color correction, minor style)
- 0.4-0.6: Moderate changes (style transfer, detail additions)
- 0.7-0.9: Major changes (significant transformation)
- 1.0: Full generation using image as rough guide only
Inpainting
Edit specific regions of an image:
[Load Image] → image
[Load Mask] → mask (white = generate, black = keep)
↓
[Set Latent Noise Mask] ← (latent from VAE Encode, mask)
↓
[KSampler] ← conditioned latent
denoise: 0.8
↓
[VAE Decode] → [Save Image]
Upscaling
AI Upscaling Models
# Download upscale models to ComfyUI/models/upscale_models/
cd ComfyUI/models/upscale_models
# RealESRGAN x4 (general purpose)
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth
# 4x-UltraSharp (popular, high quality)
# Download from CivitAI or Hugging Face
Upscale Workflow
[Load Image] → [Upscale Image (using Model)] → 4x larger image
↑
[Load Upscale Model]
# For even better results, combine with img2img:
[Upscaled Image] → [VAE Encode] → [KSampler (denoise: 0.3)] → [VAE Decode]
Batch Generation and Automation
ComfyUI API
ComfyUI has a REST API for automation:
import json
import urllib.request
def queue_prompt(workflow):
"""Send a workflow to ComfyUI for processing."""
data = json.dumps({"prompt": workflow}).encode("utf-8")
req = urllib.request.Request(
"http://127.0.0.1:8188/prompt",
data=data,
headers={"Content-Type": "application/json"},
)
return json.loads(urllib.request.urlopen(req).read())
# Load a saved workflow JSON from ComfyUI
with open("workflow_api.json") as f:
workflow = json.load(f)
# Modify the prompt
workflow["6"]["inputs"]["text"] = "A beautiful sunset over the ocean"
# Queue it
result = queue_prompt(workflow)
print(f"Queued: {result}")
Batch Processing
prompts = [
"A serene mountain lake at dawn",
"A bustling city street at night, neon lights",
"A quiet library with warm lighting and old books",
"An alien landscape with two moons",
]
for i, prompt in enumerate(prompts):
workflow["6"]["inputs"]["text"] = prompt
workflow["3"]["inputs"]["seed"] = 42 + i # Different seed each
queue_prompt(workflow)
print(f"Queued image {i+1}: {prompt[:50]}...")
Alternative Interfaces
Automatic1111 (Stable Diffusion WebUI)
The older but still popular interface:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
./webui.sh # Linux/macOS
# or webui-user.bat on Windows
Fooocus (Simplest)
Midjourney-like simplicity for local generation:
git clone https://github.com/lllyasviel/Fooocus
cd Fooocus
python -m venv venv
source venv/bin/activate
pip install -r requirements_versions.txt
python entry_with_update.py
DiffusionBee (macOS Native)
Download from diffusionbee.com for a native macOS experience.
Next Steps
- Build AI workflows: Combine image generation with LLMs for automated content creation
- Set up RAG: Local RAG Chatbot for text-based applications
- Deploy for teams: Docker guide for multi-user access
- Fine-tune models: Fine-Tuning guide to create custom models