DiffusionGemma: How Google's New Open LLM Hits 1,000 Tokens/sec and Changes Inference Economics

TL;DR: Google released DiffusionGemma, an open Apache 2.0 diffusion-based LLM that generates text up to 4x faster than autoregressive models, hitting 1,000+ tokens/sec on a single H100 and fitting in 18 GB VRAM. It trades some accuracy for speed. Here is what that means in practice. What DiffusionGemma Actually Is Google DeepMind released DiffusionGemma , the first production-grade open-weight model that applies discrete diffusion to text generation. The same family of techniques behind image ge