Master the Art of Z-Image Turbo
LoRA Training
The definitive guide based on Ostris AI Toolkit. Inject custom characters, styles, and objects into this ultra-fast model from Alibaba Cloud without sacrificing 8-step inference speed.
Extreme Speed
Generate high-quality images in just 8 steps (NFEs), achieving sub-second latency far beyond traditional SDXL.
Photorealistic Style
Excels in realistic lighting and texture representation, especially suitable for portrait photography and cinematic LoRA training.
Efficient Training
Uses specialized De-distillation Adapters to prevent training from destroying the model's speed advantage.
Prerequisites
Ensure your hardware and environment meet the following requirements before starting.
Hardware
- Recommended: 24GB+ VRAM (RTX 3090/4090) for best speed.
- Minimum: 12GB VRAM (RTX 3060). Requires float8 and memory optimization.
Environment
- Cloud (Recommended): RunPod using "Ostris AI Toolkit" template, one-click deploy.
- Local: Clone ostris/ai-toolkit and install dependencies.
6-Step Fast Track Training Process
Prepare Dataset
This determines quality. Prepare 10-30 high-quality images.
- Resolution: 1024x1024 (Sweet spot). Use 768x768 for low VRAM.
- Diversity: Ensure different angles, lighting, and backgrounds to prevent overfitting.
- Captions: Create .txt files with same name. E.g., img01.png -> img01.txt containing "[trigger], description..."
Launch AI Toolkit
We use the Ostris AI Toolkit Gradio interface for visual configuration.
python run.py --ui
RunPod users just click "Connect to HTTP Port" after deployment.
Critical Configuration
Create a new Job in UI. Strictly follow these settings to preserve Turbo speed.
| Section | Setting |
|---|---|
| MODEL | Path: Tongyi-MAI/Z-Image-Turbo (Must use training adapter preset) |
| TRAINING | Learning Rate: 0.0001 (Too high ruins the image) |
| TRAINING | Steps: 2000 - 3000 / Batch Size 1 |
| TRAINING | Optimizer: AdamW8Bit |
| TARGET | Rank: 8 - 16 (16 for complex characters) |
| ADVANCED | Advanced: Enable Differential Output Preservation |
Monitor & Selection
Watch generated previews in the Samples tab. Early steps show base model; concepts emerge gradually. Pick the last .safetensors file before overfitting.
Inference & Usage
The generated LoRA can be used directly in ComfyUI or Diffusers. Remember your trigger word.
import torch
from diffusers import AutoPipelineForText2Image
# Load base model
pipe = AutoPipelineForText2Image.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.float16
).to("cuda")
# Load trained LoRA
pipe.load_lora_weights("path/to/your_lora.safetensors")
# Inference with trigger word (8 steps)
prompt = "<myconcept>, realistic photo of a person in city"
image = pipe(prompt, num_inference_steps=8, guidance_scale=4.5).images[0]
image.save("output.png")12GB VRAM Savior Guide
- • Limit resolution: Max 768x768 or use bucketing.
- • Cache: Must enable Latents and Text Embeddings caching.
- • Optimizer: Switch to Adafactor.
- • Learning Rate: Adjust to 0.0003.
- • Steps: Reduce to 1200-2000 steps.
Common Issues
Blurry Images / Slow Speed?
Wrong Adapter or high LR likely destroyed distillation. Use default LR (0.0001) and ensure de-distillation adapter is enabled.
Concept Leakage?
Backgrounds becoming part of your character? Try enabling DOP and lowering LoRA weight to 0.6-0.8 during inference.
Lack of Face Detail?
Turbo models sometimes over-smooth skin. Add "highly detailed skin texture, raw photo" to prompts or add facial close-ups to training data.