Master the Art of Z-Image Turbo
LoRA Training

The definitive guide based on Ostris AI Toolkit. Inject custom characters, styles, and objects into this ultra-fast model from Alibaba Cloud without sacrificing 8-step inference speed.

Extreme Speed

Generate high-quality images in just 8 steps (NFEs), achieving sub-second latency far beyond traditional SDXL.

Photorealistic Style

Excels in realistic lighting and texture representation, especially suitable for portrait photography and cinematic LoRA training.

Efficient Training

Uses specialized De-distillation Adapters to prevent training from destroying the model's speed advantage.

Prerequisites

Ensure your hardware and environment meet the following requirements before starting.

Hardware

  • Recommended: 24GB+ VRAM (RTX 3090/4090) for best speed.
  • Minimum: 12GB VRAM (RTX 3060). Requires float8 and memory optimization.

Environment

  • Cloud (Recommended): RunPod using "Ostris AI Toolkit" template, one-click deploy.
  • Local: Clone ostris/ai-toolkit and install dependencies.
Try Fal.ai Cloud Training →

6-Step Fast Track Training Process

1

Prepare Dataset

This determines quality. Prepare 10-30 high-quality images.

  • Resolution: 1024x1024 (Sweet spot). Use 768x768 for low VRAM.
  • Diversity: Ensure different angles, lighting, and backgrounds to prevent overfitting.
  • Captions: Create .txt files with same name. E.g., img01.png -> img01.txt containing "[trigger], description..."
2

Launch AI Toolkit

We use the Ostris AI Toolkit Gradio interface for visual configuration.

# Local run command
python run.py --ui

RunPod users just click "Connect to HTTP Port" after deployment.

3

Critical Configuration

Create a new Job in UI. Strictly follow these settings to preserve Turbo speed.

SectionSetting
MODELPath: Tongyi-MAI/Z-Image-Turbo (Must use training adapter preset)
TRAININGLearning Rate: 0.0001 (Too high ruins the image)
TRAININGSteps: 2000 - 3000 / Batch Size 1
TRAININGOptimizer: AdamW8Bit
TARGETRank: 8 - 16 (16 for complex characters)
ADVANCEDAdvanced: Enable Differential Output Preservation
4

Monitor & Selection

Watch generated previews in the Samples tab. Early steps show base model; concepts emerge gradually. Pick the last .safetensors file before overfitting.

5

Inference & Usage

The generated LoRA can be used directly in ComfyUI or Diffusers. Remember your trigger word.

Python (Diffusers)
import torch
from diffusers import AutoPipelineForText2Image

# Load base model
pipe = AutoPipelineForText2Image.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo", 
    torch_dtype=torch.float16
).to("cuda")

# Load trained LoRA
pipe.load_lora_weights("path/to/your_lora.safetensors")

# Inference with trigger word (8 steps)
prompt = "<myconcept>, realistic photo of a person in city"
image = pipe(prompt, num_inference_steps=8, guidance_scale=4.5).images[0]
image.save("output.png")

12GB VRAM Savior Guide

  • • Limit resolution: Max 768x768 or use bucketing.
  • • Cache: Must enable Latents and Text Embeddings caching.
  • • Optimizer: Switch to Adafactor.
  • • Learning Rate: Adjust to 0.0003.
  • • Steps: Reduce to 1200-2000 steps.

Common Issues

Blurry Images / Slow Speed?

Wrong Adapter or high LR likely destroyed distillation. Use default LR (0.0001) and ensure de-distillation adapter is enabled.

Concept Leakage?

Backgrounds becoming part of your character? Try enabling DOP and lowering LoRA weight to 0.6-0.8 during inference.

Lack of Face Detail?

Turbo models sometimes over-smooth skin. Add "highly detailed skin texture, raw photo" to prompts or add facial close-ups to training data.