Z-Image ComfyUI Guide

Next-gen open-source image generation model based on S3-DiT architecture.
6B Params · 8-Step Turbo · Photorealistic

Z-Image-Turbo

Recommended

Distilled version, optimized for speed and VRAM.

  • Only 8 steps inference
  • < 16GB VRAM (Consumer)
  • No negative prompt needed

Z-Image-Base

Non-distilled foundation model, base for community dev.

  • Ideal for LoRA training
  • Community fine-tuning
  • Checkpoints released

Z-Image-Edit

Fine-tuned specifically for image editing tasks.

  • Strong instruction following
  • Precise local editing
  • Complex editing tasks

Installation Guide

Ensure you have the latest ComfyUI. Download files from Hugging Face and place them as follows.

# Structure
ComfyUI/
├── models/
├── text_encoders/
└── qwen_3_4b.safetensors // Text Encoder
├── diffusion_models/
└── z_image_turbo_bf16.safetensors // Main Model (FP8/GGUF opt)
├── vae/
└── ae.safetensors // Flux 1 VAE
├── model_patches/
└── Z-Image-Turbo-Fun-Controlnet-Union.safetensors // (Optional) ControlNet

Core Advantages

Bilingual: Native support for Chinese prompts, excellent complex text rendering.

Uncensored: Supports uncensored generation modes for high creative freedom.

Ecosystem: Perfect support for ControlNet and LoRA extensions.

Prompt Tips

!

Turbo model needs NO negative prompts.

!

Add lighting keywords: "volumetric lighting", "cinematic lighting".

!

Be as specific as possible (scene, pose, texture).

Resources