Beyond Standard LLMs — research.io

Beyond Standard LLMs Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers From DeepSeek R1 to MiniMax-M2, the largest and most capable open-weight LLMs today remain autoregressive decoder-style transformers, which are built on flavors of the original multi-head attention mechanism. However, we have also seen alternatives to standard LLMs popping up in recent years, from text diffusion models to the most recent linear attention hybrid architectures. Some..