The Big LLM Architecture Comparison
Sebastian Raschka, PhD
The Big LLM Architecture Comparison
From DeepSeek V3 to GLM-5: A Look At Modern LLM Architecture Design
Last updated: Apr 2, 2026 (added Gemma 4 in section 23)
It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are.
Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head...
