• About
  • Advertise
  • Privacy & Policy
  • Contact
Friday, December 26, 2025
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

I Measured Neural Network Training Every 5 Steps for 10,000 Iterations

AiNEWS2025 by AiNEWS2025
2025-11-15
in Machine Learning
0
I Measured Neural Network Training Every 5 Steps for 10,000 Iterations
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


how neural networks learned. Train them, watch the loss go down, save checkpoints every epoch. Standard workflow. Then I measured training dynamics at 5-step intervals instead of epoch-level, and everything I thought I knew fell apart.

The question that started this journey: Does a neural network’s capacity expand during training, or is it fixed from initialization? Until 2019, we all assumed the answer was obvious—parameters are fixed, so capacity must be fixed too. But Ansuini et al. discovered something that shouldn’t be possible: the effective representational dimensionality increases during training. Yang et al. confirmed it in 2024.

This changes everything. If learning space expands while the network learns, how can we mechanistically understand what it’s actually doing?

High-Frequency Training Checkpoints

When we are training a DNN with 10,000 steps, we used to set up chack points every 100 or 200 steps. Measuring at 5-step intervals generates too much records that aren’t easy to manage. But these high-frequency checkpoints reveal very valuable information about how a DNN learns.

High-frequency checkpoints provide information about:

  • Whether early training mistakes can be recovered from (they often can’t)
  • Why some architectures work and others fail
  • When interpretability analysis should happen (spoiler: way earlier than we thought)
  • How to design better training approaches

During an applied research project I have measured DNN training at high resolution — every 5 steps instead of every 100 or 500. I used a basic MLP architecture with the same dataset I’ve been using for the last 10 years.

Figure 1. Experimental setupWe detect discrete transitions using z-score analysis with
rolling statistics:

The results were surprising. Deep neural networks, even simple architectures, expand their effective parameter space during training. I had assumed this space was predetermined by the architecture itself. Instead, DNNs undergo discrete transitions—small jumps that increase the effective dimensionality of their learning space.

Figure 2: Effective dimensionality of activation patterns during training, measured using stable rank. We see three distinct phases emerge: initial collapse (steps 0-300) where dimensionality drops from 2500 to 500, expansion phase (steps 300-5000) where dimensionality climbs to 1000, and stabilization (steps 5000-8000) where dimensionality plateaus. This suggests steps 0-2000 constitute a qualitatively
distinct developmental window. Image by author.

In Figure 2 we can see the monitoring of activation effective dimensionality during training. We see these transitions concentrate in the first 25% of training, and are hidden at larger checkpoint intervals (100-1000 steps). We needed a high-frequency checkpointing (5 steps) to detect most of them. The curve also shows an interesting behavior. The initial collapse represents loss landscape restructuring where random initialization gives way to a task-aligned structure. Then we see an expansion phase with gradual dimensionality growth. Between 2000-3000 steps, there is a stabilization that reflects DNN architectural capacity limits.

Figure 3: Representational dimensionality (measured using stable rank) shows strong negative correlation with loss (ρ = −0.951) and moderate negative correlation with gradient magnitude (ρ = −0.701). As loss decreases from 2.0 to near zero, dimensionality expands from 9.0 to 9.6. Counterintuitively, improved performance correlates with expanded rather than com- pressed representations. Image by author.

This changes how we should think about DNN training, interpretability, and architecture design.

Exploration vs Expansion

Consider the following two scenarios:

Scenario A:
Fixed Capacity (Exploration)
Scenario B:
Expanding Capacity (Innovation)
Your network starts with a fixed representational capacity. Training explores different regions of this pre-determined space. It is like navigating a map that exists from the beginning. Early training just means “haven’t found the good region yet”. Your network starts with minimal capacity. Training creates representational structures. Its like building roads while traveling — each road enables new destinations. Early training establishes what becomes learnable later.

Which is it?

The question matters because if capacity expands, then early training isn’t recoverable. You can’t just “train longer” to fix early mistakes. So, interpretability has a timeline where features form in sequence. Understanding this sequence is key. Furehtermore, architecture design seems to be about expansion rate not just final capacity. Finally, critical periods exist. If we miss the window, we miss the capability.

When We Need to Measure High-Frequency Checkpoints

Expansion vs Exploration

Figure 4: High-frequency sampling vs. Low Frequency sampling in the experiment describred in Figure 1. We detect discrete transitions using z-score analysis with rolling statistics. High-frequency sampling captures rapid transitions that coarse-grained measurement misses. This comparison tests whether temporal resolution affects observable dynamics.

As seen in Figures 2 and 3, high-frequency sampling reveals interesting information. We can indentify three different phases:

Phase 1: Collapse (steps 0-300) The network restructures from random initialization. Dimensionality drops sharply as the loss landscape is reshaped around the task. This isn’t learning yet, it’s preparation for learning.
Phase 2: Expansion (steps 300-5,000)
Dimensionality climbs steadily. This is capacity expansion. The network is building representational structures. Simple features that enable complex features that enable higher-order features.
Phase 3: Stabilization (steps 5,000-8,000) Growth plateaus. Architectural constraints bind. The network refines what it has rather than building new capacity.

This plots reveals expansion, not exploration. The network at step 5,000 can represent functions that were impossible at step 300 because they didn’t exist.

Capacity Expands, Parameters Don’t

Figure 5: Comparison of activation space to weight space.
Weight space dimensionality remains nearly constant
(9.72-9.79) with only 1 detected “jump” across 8000 steps. Image by author

The comparison between activation and weight spaces shows that both follow different dynamics with high-frequency sampling. The activation space shows ap. 85 discrete jumps (including Gaussian noise). The weight space shows only 1. The same network with the same training run. It confirms that the network at step 8000 computes functions inaccessible at step 500 despite an identical parameter count. This is the clearest evidence for expansion.

DNNs innovate by generating new parameter space options during training in order to represent complex tasks.

Transitions Are Fast and Early

We have seen how high-frequency sampling shows many more transitions. Low-frequency checkpointing would miss nearly all of them. These transitions concentrate early. Two thirds of all transitions happen in the first 2,000 steps — just 25% of total training time. It means that if we want to understand what features form and when, we need to look during steps 0-2,000, not at convergence. By step 5,000, the story is over.

Expansion Couples to Optimization

If we look again at Figure 3, we see that as loss decreases, dimensionality expands. The network doesn’t simplify as it learns. It becomes more complex. Dimensionality correlates strongly with loss (ρ = -0.951) and moderately with gradient magnitude (ρ = -0.701). This could seem counterintuitive: improved performance correlates with expanded rather than compressed representations. We might expect networks to find simpler, more compressed representations as they learn. Instead, they expand into higher-dimensional spaces.

Why?

A possible explanation is that complex tasks require complex representations. The network doesn’t find a simpler explanation and builds the representational changes needed to separate classes, recognize patterns, and generalize.

Practical Deployment

We have seen a different way to understand and debug DNN training across any domain.

If we know when features form during training, we can analyze them as they crystallize rather than reverse-engineering a black box afterward.

In real deployment scenarios, we can track representational dimensionality in real-time, detect when expansion phases occur, and run interpretability analyses at each transition point. This tells us precisely when our network is building new representational structures—and when it’s finished. The measurement approach is architecture-agnostic: it works whether you’re training CNNs for vision, transformers for language, RL agents for control, or multimodal models for cross-domain tasks.

Example 1: Intervention experiments that map causal dependencies. Disrupt training during specific windows and measure which downstream capabilities are lost. If corrupting data during steps 2,000-5,000 permanently damages texture recognition but the same corruption at step 6,000 has no effect, you’ve found when texture features crystallize and what they depend on. This works identically for object recognition in vision models, syntactic structure in language models, or state discrimination in RL agents.
Example 2: For production deployment, continuous dimensionality monitoring catches representational problems during training when you can still fix them. If layers stop expanding, you have architectural bottlenecks. If expansion becomes erratic, you have instability. If early layers saturate while late layers fail to expand, you have information flow problems. Standard loss curves won’t show these issues until it’s too late—dimensionality tracking surfaces them immediately.
Example 3: The architecture design implications are equally practical. Measure expansion dynamics during the first 5-10% of training across candidate architectures. Select for clean phase transitions and structured bottom-up development. These networks aren’t just more performant—they’re fundamentally more interpretable because features form in clear sequential layers rather than tangled simultaneity.

What’s Next

So we’ve established that networks expand their representational space during training, that we can measure these transitions at high resolution, and that this opens new approaches to interpretability and intervention. The natural question: can you apply this to your own work?

I’m releasing the complete measurement infrastructure as open source. I included validated implementations for MLPs, CNNs, ResNets, Transformers, and Vision Transformers, with hooks for custom architectures.

Everything runs with three lines added to your training loop.

The GitHub repository provides experiment templates for the experiments discussed above: feature formation mapping, intervention protocols, cross-architecture transfer prediction, and production monitoring setups. The measurement methodology is validated. What matters now is what you discover when you apply it to your domain.

Try it:

pip install ndtracker

Quickstart, instructions, and examples in the repository: Neural Dimensionality Tracker (NDT)

The code is production-ready. The protocols are documented. The questions are open. I would like to see what you find when you measure your training dynamics at high resolution no matter the context and the architecture.

You can share your results, open issues with your findings, or just ⭐️ the repo if this changes how you think about training. Remember, the interpretability timeline exists across all neural architectures.

Javier Marín | LinkedIn | Twitter


References & Further Reading

  • Achille, A., Rovere, M., & Soatto, S. (2019). Critical learning periods in deep networks. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=BkeStsCcKQ
  • Frankle, J., Dziugaite, G. K., Roy, D. M., & Carbin, M. (2020). Linear mode connectivity and the lottery ticket hypothesis. In Proceedings of the 37th International Conference on Machine Learning (pp. 3259-3269). PMLR. https://proceedings.mlr.press/v119/frankle20a.html
  • Ansuini, A., Laio, A., Macke, J. H., & Zoccolan, D. (2019). Intrinsic dimension of data representations in deep neural networks. In Advances in Neural Information Processing Systems (Vol. 32, pp. 6109-6119). https://proceedings.neurips.cc/paper/2019/hash/cfcce0621b49c983991ead4c3d4d3b6b-Abstract.html
  • Yang, J., Zhao, Y., & Zhu, Q. (2024). ε-rank and the staircase phenomenon: New insights into neural network training dynamics. arXiv preprint arXiv:2412.05144. https://arxiv.org/abs/2412.05144
  • Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature visualization. Distill, 2(11), e7. https://doi.org/10.23915/distill.00007
  • Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., DasSarma, N., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., Amodei, D., Brown, T., Clark, J., Kaplan, J., McCandlish, S., & Olah, C. (2021). A mathematical framework for transformer circuits. Transformer Circuits Thread. https://transformer-circuits.pub/2021/framework/index.html

Source link

#Measured #Neural #Network #Training #Steps #Iterations

Tags: deep learningDeep Neural NetworksExplainable AiInterpretable MlNeural Network
Previous Post

Wyoming dinosaur mummies give us a new view of duck-billed species

Next Post

These technologies could help put a stop to animal testing

AiNEWS2025

AiNEWS2025

Next Post
These technologies could help put a stop to animal testing

These technologies could help put a stop to animal testing

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

2025-12-26
TV Technica: Our favorite shows of 2025

TV Technica: Our favorite shows of 2025

2025-12-26
In 2025, Hollywood cozied up to AI and had nothing good to show for it

In 2025, Hollywood cozied up to AI and had nothing good to show for it

2025-12-26
Justice Department Humiliated as People Find the Epstein Files Can Easily Be Un-Redacted

Justice Department Humiliated as People Find the Epstein Files Can Easily Be Un-Redacted

2025-12-26

Recent News

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

2025-12-26
TV Technica: Our favorite shows of 2025

TV Technica: Our favorite shows of 2025

2025-12-26
In 2025, Hollywood cozied up to AI and had nothing good to show for it

In 2025, Hollywood cozied up to AI and had nothing good to show for it

2025-12-26
Justice Department Humiliated as People Find the Epstein Files Can Easily Be Un-Redacted

Justice Department Humiliated as People Find the Epstein Files Can Easily Be Un-Redacted

2025-12-26
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

Why MAP and MRR Fail for Search Ranking (and What to Use Instead)

2025-12-26
TV Technica: Our favorite shows of 2025

TV Technica: Our favorite shows of 2025

2025-12-26
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.