Superposition Yields Robust Neural Scaling

arXiv:2505.10465v1 Announce Type: cross
Abstract: The success of today’s large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law — the finding that loss decreases as a power law with model size — remains unclear. Starting from two empirical principles — that LLMs represent more things than the model dimensions (widths) they have (i.e., representations are superposed), and that words or concepts in language occur with varying frequencies — we constructed a toy model to study the loss scaling with model size. We found that when superposition is weak, meaning only the most frequent features are represented without interference, the scaling of loss with model size depends on the underlying feature frequency; if feature frequencies follow a power law, so does the loss. In contrast, under strong superposition, where all features are represented but overlap with each other, the loss becomes inversely proportional to the model dimension across a wide range of feature frequency distributions. This robust scaling behavior is explained geometrically: when many more vectors are packed into a lower dimensional space, the interference (squared overlaps) between vectors scales inversely with that dimension. We then analyzed four families of open-sourced LLMs and found that they exhibit strong superposition and quantitatively match the predictions of our toy model. The Chinchilla scaling law turned out to also agree with our results. We conclude that representation superposition is an important mechanism underlying the observed neural scaling laws. We anticipate that these insights will inspire new training strategies and model architectures to achieve better performance with less computation and fewer parameters.

Source link

#Superposition #Yields #Robust #Neural #Scaling

Superposition Yields Robust Neural Scaling

Recent Posts

PayPal dominates web store payments in Germany, but Apple Pay reigns in the UK

SS&C acquires funds network Calastone in $1bn deal

What Is a Query Folding in Power BI and Why should You Care?

Microsoft to stop using China-based teams to support Department of Defense

Best Breast Pumps (2025): Wearable, Portable, Easy to Clean

What role should oil and gas companies play in climate tech?

The 12 best laptops for high school and college students

Nemo Dagger Osmo Tent Review (2025): 2-Person Backcountry Palace

The AI Boom Is Creating Housing Costs in the Bay Area That You’ll Think You Must Be Hallucinating

CookUnity Prepared Meal Delivery Review (2025): Chef-Centric Meals