Z-Image
Lightweight Image Generation Engine

Z-Image is a lightweight image generation tool featuring an efficient 8-Step inference architecture. It delivers fast, high-quality AI image generation on consumer-grade GPUs while significantly reducing computational costs.

Prompt 0/500

Dimensions

Example Showcase

Cinematic Jazz Saxophonist

A highly cinematic film grain photograph. In a smoky, dimly lit vintage jazz bar, an elderly saxophonist is playing passionately. A warm yellow stage spotlight hits his face from the side, casting a long shadow behind him. The background shows blurred audience members holding drinks and flickering neon signs. Kodak Portra 400 film texture.

Tokyo Rainy Night Street Documentary

A documentary-style medium shot, taken at the busy Shibuya Crossing in Tokyo. Rainy night, wet ground reflecting colorful neon billboards. A young woman holding a transparent umbrella looks back at the camera in the crowd, her eyes lost. Around her are hurried pedestrians and taxis. High ISO, with noise.

The Artisan Watchmaker

A candid photograph of an elderly artisan watchmaker at his cluttered workbench. He is wearing magnifying glasses and carefully working on tiny gears with tweezers. The room is filled with antique tools, clock parts, and warm, cluttered light from a desk lamp. Every detail of dust and metal texture is sharp.

Tang Dynasty Hanfu Portrait

A half-length portrait of a lady in gorgeous Tang Dynasty Hanfu. She wears a red wide-sleeved robe embroidered with gold, featuring exquisite peony and phoenix patterns, with a strong silk luster. She has a high bun and wears golden hairpins. The background is a blurred classical garden with blooming crabapple flowers. Soft natural light.

High-Fashion Texture

A high-fashion editorial photograph of a model wearing an avant-garde outfit made entirely of recycled materials and woven plastic. The focus is on the complex textures and layers of the garment. She stands in a brutalist concrete environment. Stark, architectural lighting emphasizes the geometric shapes of the clothing.

Studio Ghibli Illustration

A serene digital illustration in the style of Studio Ghibli. A cozy, cluttered cottage built into the roots of a giant ancient tree. Smoke gently rises from the chimney. Rolling green hills and fluffy clouds in a pastel blue sky. Watercolor texture, warm color palette, inviting atmosphere.

Vintage Movie Poster "The Taste of Memory"

A fictional English movie poster for "The Taste of Memory". Set in a rustic 19th-century style kitchen. The main subject is silhouettes of a man and woman passing each other on a blurred rainy street, in rich blue-green tones. The title is in white handwritten calligraphy vertically arranged on the right side. Small text at the bottom reads "A FILM BY WONG KAR-WAI". Old paper and crease texture.

Nature Magazine Cover

A vertical magazine cover design. The main subject is a macro photograph of a vibrant blue Morpho butterfly resting on a dew-covered green leaf. The title "BIODIVERSITY" is at the top in large, bold, white sans-serif font. Below it, subtitles read "The Hidden World of Insects" and "Photography by A. Smith". The overall composition is clean and striking.

Minimalist Chair Poster Design

A minimalist-style product promotional poster. In the center is a designer wooden chair placed alone by a pure white infinity pool. The background is a minimalist blue sky and horizon. At the top in thin black font is the brand name "NORDIC LIVING", and at the bottom the slogan "Less is More". Clean lighting, composition with white space.

Not Just Fast, It's Fully Evolved

Filling the gap between lightweight and massive models, Z-Image-Turbo finds the perfect balance between speed, quality, and usability.

Native Bilingual Support

Powered by Qwen 3.4B LLM. No more garbled Chinese characters. Calligraphy, signage, and complex typography are rendered precisely.

S3-DiT Single Stream

Radical architectural innovation. Text and image tokens are processed consistently, similar to GPT-4, utilizing every parameter for both generation and understanding.

Apache 2.0 License

True open-source freedom. Unlike Flux.1's commercial restrictions, you are free to use it commercially, modify, and integrate. Ideal for startups and game studios.

6B Parameters Golden Balance

8 Steps Inference Decoupled-DMD

Qwen 3.4B Text Encoder Native Bilingual

12GB VRAM Req No Quantization Needed

Core Technology

S3-DiT: Breaking Modal Barriers

Traditional models use a "dual-stream" architecture. Z-Image-Turbo adopts Scalable Single-Stream Diffusion Transformer (S3-DiT).

Unified Input Stream: Text Tokens and Image Latents are concatenated directly.
Full Parameter Interaction: Every Transformer layer performs deep text-image attention calculation.
Decoupled-DMD: The core algorithm that compresses inference to just 8 steps.
CFG Enhancement: Independently optimized guidance signals for sharp images without high CFG values.

Architecture_v1.0

Text Token

Img Latent

Unified Transformer Block Self-Attention (All-to-All)

High-Fidelity Output (8 Steps)

Why Choose Z-Image-Turbo?

We provide the optimal solution balancing performance, cost, and ecosystem.

Metric	Z-Image-Turbo	Flux.1 (Dev)	SDXL Base
Parameters	6B (Balanced)	12B (Massive)	2.6B
VRAM Req	12GB (Native BF16)	24GB+ (or Quant)	8GB
Steps	8 Steps (Distilled)	20-50 Steps	20-50 Steps
Text Encoder	Qwen 3.4B (Bilingual)	T5 + CLIP	OpenCLIP
Typography	⭐️⭐️⭐️⭐️⭐️ Perfect	⭐️⭐️ Poor	⭐️ Garbled
License	Apache 2.0	Non-Commercial	OpenRAIL++
Cost/Img	~$0.0029	High	Low

A Boon for Consumer Hardware

Thanks to the 6B parameter scale and 8-step distillation, Z-Image-Turbo achieves 2-3s generation on RTX 3090/4090. For enterprise H800s, sub-second response is reality.

Nvidia H800 (Enterprise) < 1 s

RTX 4090 (Consumer High-End) ~ 2.5 s

Flux.1 Dev (RTX 4090) ~ 10 s+

Quick Start

# Quick load with Diffusers

from diffusers import DiffusionPipeline

import torch

# Load 8-Step Turbo Model

pipe = DiffusionPipeline.from_pretrained(

"Tongyi-MAI/Z-Image-Turbo",

torch_dtype=torch.bfloat16

).to("cuda")

# Generate Image

image = pipe(

prompt="Cyberpunk detective, rainy night, neon lights, Chinese sign saying "Tongyi Lab"",

num_inference_steps=8,

guidance_scale=1.0 # Distilled models don't need high CFG

).images[0]

Frequently Asked Questions

Questions about deployment, usage, and licensing.

GPU requirements?

For native precision (BF16), 16GB VRAM (RTX 4080/3090) is recommended. With GGUF/NF4 quantization, 8GB VRAM cards (RTX 3060) run smoothly with minimal quality loss.

Can I use it commercially?

Yes. Z-Image-Turbo uses the permissive Apache 2.0 license. You can use it freely for commercial products without fees.

How to write Chinese prompts?

Just like chatting naturally. Thanks to Qwen 3.4B, you can use complex sentences, idioms, or poems. For text rendering, wrap specific text in quotes.

Support for ComfyUI / WebUI?

Yes. ComfyUI has Day-0 support (update to latest). Automatic1111 support is in the dev branch and coming soon.

Advantage over Flux.1?

Z-Image-Turbo solves efficiency and usability. While Flux is great for extreme quality, Z-Image offers 3x speed, half VRAM usage, and superior Chinese support.