Master the Art of Z-Image Turbo
LoRA Training

The definitive guide based on Ostris AI Toolkit. Inject custom characters, styles, and objects into this ultra-fast model from Alibaba Cloud without sacrificing 8-step inference speed.

Extreme Speed

Generate high-quality images in just 8 steps (NFEs), achieving sub-second latency far beyond traditional SDXL.

Photorealistic Style

Excels in realistic lighting and texture representation, especially suitable for portrait photography and cinematic LoRA training.

Efficient Training

Uses specialized De-distillation Adapters to prevent training from destroying the model's speed advantage.

Prerequisites

Ensure your hardware and environment meet the following requirements before starting.

Hardware

Recommended: 24GB+ VRAM (RTX 3090/4090) for best speed.
Minimum: 12GB VRAM (RTX 3060). Requires float8 and memory optimization.

Environment

Cloud (Recommended): RunPod using "Ostris AI Toolkit" template, one-click deploy.
Local: Clone ostris/ai-toolkit and install dependencies.

Try Fal.ai Cloud Training →

6-Step Fast Track Training Process

Prepare Dataset

This determines quality. Prepare 10-30 high-quality images.

Resolution: 1024x1024 (Sweet spot). Use 768x768 for low VRAM.
Diversity: Ensure different angles, lighting, and backgrounds to prevent overfitting.
Captions: Create .txt files with same name. E.g., img01.png -> img01.txt containing "[trigger], description..."

Launch AI Toolkit

We use the Ostris AI Toolkit Gradio interface for visual configuration.

# Local run command
python run.py --ui

RunPod users just click "Connect to HTTP Port" after deployment.

Critical Configuration

Create a new Job in UI. Strictly follow these settings to preserve Turbo speed.

Section	Setting
MODEL	Path: Tongyi-MAI/Z-Image-Turbo (Must use training adapter preset)
TRAINING	Learning Rate: 0.0001 (Too high ruins the image)
TRAINING	Steps: 2000 - 3000 / Batch Size 1
TRAINING	Optimizer: AdamW8Bit
TARGET	Rank: 8 - 16 (16 for complex characters)
ADVANCED	Advanced: Enable Differential Output Preservation

Monitor & Selection

Watch generated previews in the Samples tab. Early steps show base model; concepts emerge gradually. Pick the last .safetensors file before overfitting.

Inference & Usage

The generated LoRA can be used directly in ComfyUI or Diffusers. Remember your trigger word.

Python (Diffusers)

import torch
from diffusers import AutoPipelineForText2Image

# Load base model
pipe = AutoPipelineForText2Image.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo", 
    torch_dtype=torch.float16
).to("cuda")

# Load trained LoRA
pipe.load_lora_weights("path/to/your_lora.safetensors")

# Inference with trigger word (8 steps)
prompt = "<myconcept>, realistic photo of a person in city"
image = pipe(prompt, num_inference_steps=8, guidance_scale=4.5).images[0]
image.save("output.png")

12GB VRAM Savior Guide

• Limit resolution: Max 768x768 or use bucketing.
• Cache: Must enable Latents and Text Embeddings caching.
• Optimizer: Switch to Adafactor.
• Learning Rate: Adjust to 0.0003.
• Steps: Reduce to 1200-2000 steps.

Common Issues

Blurry Images / Slow Speed?

Wrong Adapter or high LR likely destroyed distillation. Use default LR (0.0001) and ensure de-distillation adapter is enabled.

Concept Leakage?

Backgrounds becoming part of your character? Try enabling DOP and lowering LoRA weight to 0.6-0.8 during inference.

Lack of Face Detail?

Turbo models sometimes over-smooth skin. Add "highly detailed skin texture, raw photo" to prompts or add facial close-ups to training data.

Master the Art of Z-Image TurboLoRA Training