Life in the Singularity

Life in the Singularity

How Attention Residuals are Rewiring the Modern LLM

Matt McDonagh's avatar
Matt McDonagh
Mar 16, 2026
∙ Paid

TL;DR

The foundational wiring of large language models just got a massive, long-overdue upgrade. For years, AI architectures relied on standard residual connections which blindly accumulate data layer by layer with fixed unit weights. This uniform aggregation leads to uncontrolled hidden-state growth as the network gets deeper. Now, researchers from the Kimi Team have introduced Attention Residuals. By applying softmax attention across the depth of the network, each layer can now selectively pull exactly the information it needs from previous layers using learned, input-dependent weights. To make this scale, they built Block Attention Residuals to chunk these layers together and drastically reduce memory footprints. The result is an architectural breakthrough that matches the performance of standard models trained with 1.25x more compute.

This is a smarter, leaner, and fundamentally superior way to build a neural network.

The Background

To understand why this is a monumental shift in AI …

User's avatar

Continue reading this post for free, courtesy of Matt McDonagh.

Or purchase a paid subscription.
© 2026 Matt McDonagh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture