• About
  • Advertise
  • Privacy & Policy
  • Contact
Wednesday, December 24, 2025
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation

AiNEWS2025 by AiNEWS2025
2025-10-24
in Machine Learning
0
When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


While working on my Knowledge Distillation problem for intent classification, I faced a puzzling roadblock. My setup involved a teacher model, which is RoBERTa-large (finetuned on my intent classification), and a student model, which I was trying to train without losing too much accuracy compared to the teacher.

I experimented with multiple mapping techniques, connecting every 2nd layer to the student layer, averaging two teacher layers into one, and even assigning custom weights like giving (0.3 to l1 and 0.7 to l2). But no matter what combination I tried, the teacher’s accuracy never matched the student model.

That’s when I started exploring how to map the most informative layers to my student model so that the student can maximize its performance. I wanted a way to quantify which layer of the teacher model truly matters for distillation.

In that search, I stumbled upon a fascinating paper—”SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis,” which tackled a similar problem but in the image domain. The authors used a spectral analysis approach (Spectral KD) to more intelligently align the teacher and student models.

Curious, I decided to adapt the idea to text data – and BOOM!!!, it actually worked! For the first time, my student model started thinking almost like its teacher.

Source: Author

Here’s the layer intensity graph of my fine-tuned RoBERTa-large model. Based on the spectral insights, I selected layers 1–9 and 21–23 for my student model during knowledge distillation, the ones carrying the richest information.

I can’t share my dataset or code for confidentiality reasons, but I’ll walk you through how the paper’s image-based approach inspired my text-based adaptation, and how you can think about doing the same.


Behind the Scenes: How FFT Reveals a Model’s Spectral Soul

So, let’s start with spectral intensity, and slowly dive into the real magician here: the Fast Fourier Transform (FFT).

In the spectralKD paper, the authors introduce a framework that helps us to see Vision Transformer(ViTs), not just what they are predicting, but also how the information flows in the layers. Instead of relying on intuition or visualisation, they use spectral analysis, a way to measure the frequency richness of the model’s internal representations.

Imagine each Transformer layer as the musician in an orchestra, some layers play high notes(fine details), while others play low notes(broad features). The FFT helps us to listen to each player’s music separately and filter out which one is having the strongest melodies, i.e., the most information-rich signals.

Source: Author

Step 1: Feature maps, The raw material

B is batch size
C is number of channels and,
H,W is the spatial height and width.

Step 2: Applying the fourier Transform

The authors apply a 1-dimensional FFT along the channel dimension to translate these real-valued activations into the frequency domain:
F(X)=FFT(X)

This means:
For every spatial location (b, h, w), a 1D FFT is computed across all channels.
The result is a complex-valued tensor (since FFT outputs real + imaginary parts).
F(X) therefore tells us how much of each frequency is present in that layer’s representation.

And if you’re wondering, “Why FFT though?” — hold that thought.
Because later in this blog, we’re going to uncover exactly why FFT is the perfect tool to measure a model’s inner intensity.

Step 3: measuring frequency strength

Re(F(X)) is the real part,
Im(F(X)) is the imaginary part.

Step 4: Averaging across the map

Now we want to summarize this intensity across all positions in the layer:

This step tells us the average intensity of the single channel

And then you can simply do average of each channels. Voilà! Now you have the spectral intensity of the single layer of the Vision Transformer.


Peeking into the Frequency Realm: The Fourier Lens of SpectralKD

Let’s look into the Fast Fourier Transform:

Xₖ is the input sequence (your signal, feature, or activation pattern).
xₙ is the frequency component at the frequency index.
N is the number of points in the sequence (i.e., number of channels or features).

Each term e⁻ʲ²πᵏⁿ/ᴺ acts as a rotating phasor, a tiny complex wave spinning through the signal space, and together, they form one of the most beautiful ideas in signal processing.

Source: Author (Here, a rotating phasor e⁻ʲ²πᵏⁿ/ᴺ is getting multiplied by g(t) in a complex plane)
source: Author (Average out all the points in the complex plane, then it will give you the center of mass of the phasor entity, and it gets peaked only at a specific frequency or K (in the above case, it is 3))

.OMG! What just happened here? Let me break it down.

When you multiply your hidden activations xₙ (say, across channels or feature dimensions) by this phasor, you’re essentially asking:

“Hey, layer, how much of the k-th type of variation do you contain in your representations?”

Each frequency k corresponds to a distinct pattern scale across the feature dimensions.

Lower k values capture broad, smooth semantic structures (like topic-level context), while higher k values capture rapid, fine-grained variations (like token-level nuances or syntactic signals).

Now here’s the fun part: if some layer resonates with a particular frequency pattern, the multiplication of the Fourier Transform aligns perfectly, and the sum in the Fourier formula produces a strong response for that k.

If not, the rotations cancel out, meaning that frequency doesn’t play a big role in that layer’s representation.

So, the Fourier Transform isn’t adding anything new; it is just finding out how our layer encodes information across different scales of abstraction.

It’s like zooming out and realizing:

  • Some layers hum quietly with smooth, conceptual meanings (low frequencies),
  • Others buzz with sharp, detailed interactions between tokens (high frequencies).

The FFT basically turns a layer’s hidden states into a frequency fingerprint — a map of what kinds of information that layer is focusing on.

And that’s exactly what SpectralKD uses to figure out which layers are actually doing the heavy lifting during knowledge distillation.

If you still need the visualization and more intuition of the Fourier transform, you can just go through the 3Blue1Brown Video, “But what is the Fourier Transform? A visual introduction.”


From Vision to Language: How Spectral Intensity Guided My Intent Classifier

Source: Author

Let a layer activation tensor be:

where:

  • N = number of samples (batch size)
  • L = sequence length (number of tokens/time steps)
  • H = hidden dimension (number of channels/features produced by the layer)

Each Sample i has an activation matrix Xᵢ ∈ Rᴸ ˣ ᴴ (sequence positions x hidden features)

Now again, you can compute the FFT of that Xᵢ and then measure the frequency length using the real and imaginary components and average out across the channels, and then for each layer.

Frequency length:

Frequency across channels:

Frequency across a layer:

Here, K is the number of bins retained.


Conclusion

Their analysis shows two major insights:

  1. Not all layers contribute equally. In uniform transformer architectures, only a few early and final layers show strong spectral activity, the true “hotspots” of information flow.
  2. Different transformer types, similar melodies. Despite architectural variations, both hierarchical and uniform transformers share surprisingly similar spectral patterns, hinting at a universal way these models learn and represent knowledge.

Building on these findings, SpectralKD introduces a simple, parameter-free knowledge distillation (KD) strategy. By selectively aligning the spectral behavior of early and final layers between a teacher and a student model, the student learns to mimic the teacher’s spectral signature, even in intermediate layers that were never explicitly aligned.

The results are striking in the paper: the distilled student (DeiT-Tiny) doesn’t just match performance on benchmarks like ImageNet-1K, it also learns to think spectrally like the teacher, capturing both local and global information with remarkable allegiance.

Ultimately, SpectralKD bridges interpretability and distillation, offering a fresh way to visualize what happens inside transformers during learning. It opens a new line of research, the authors call “distillation dynamics”, a journey into how knowledge itself flows, oscillates, and harmonizes between teacher and student networks.


References

Core Spectral & Transformer Foundations

  • Vaswani, A. Attention Is All You Need. NeurIPS, 2017.
  • Dosovitskiy, A. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, 2020.
  • Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., & Dosovitskiy, A. Do Vision Transformers See Like Convolutional Neural Networks? NeurIPS, 2021.
  • Han, K. et al. A Survey on Vision Transformer. IEEE TPAMI, 2022.

Interpretability & Spectral Analysis

  • Chefer, H., Gur, S., & Wolf, L. Transformer Interpretability Beyond Attention Visualization. CVPR, 2021.
  • Yeh, C. et al. AttentionViz: A Global View of Transformer Attention. IEEE TVCG, 2023.
  • Zeng, J. et al. Peeling Back the Layers: Interpreting the Storytelling of ViT. ACM Multimedia, 2024.

Knowledge Distillation & Model Compression

  • Hinton, G. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531, 2015.
  • Phuong, M., & Lampert, C. Towards Understanding Knowledge Distillation. ICML, 2019.
  • Park, W. et al. Relational Knowledge Distillation. CVPR, 2019.
  • Chandrasegaran, K. et al. Revisiting Label Smoothing and Knowledge Distillation Compatibility: What Was Missing? ICML, 2022.
  • Huang, T. et al. Knowledge Distillation from a Stronger Teacher. NeurIPS, 2022.
  • Pham, C. et al. Frequency Attention for Knowledge Distillation. WACV, 2024.
  • Fan, J. et al. ScaleKD: Strong Vision Transformers Could Be Excellent Teachers. arXiv preprint arXiv:2411.06786, 2024.
  • Son, S. et al. The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers. ECCV, 2025.

SpectralKD Core Paper

Source link

#Transformers #Sing #Adapting #SpectralKD #TextBased #Knowledge #Distillation

Tags: data sciencedeep learningFast Fourier Transformmachine learningVision Transformer
Previous Post

With new acquisition, OpenAI signals plans to integrate deeper into the OS

Next Post

This startup is about to conduct the biggest real-world test of aluminum as a zero-carbon fuel

AiNEWS2025

AiNEWS2025

Next Post
This startup is about to conduct the biggest real-world test of aluminum as a zero-carbon fuel

This startup is about to conduct the biggest real-world test of aluminum as a zero-carbon fuel

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Artificial Intelligence at Samsung – Two Use Cases

Artificial Intelligence at Samsung – Two Use Cases

2025-12-24
The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

2025-12-24
China just carried out its second reusable launch attempt in three weeks

China just carried out its second reusable launch attempt in three weeks

2025-12-24
How social media encourages the worst of AI boosterism

How social media encourages the worst of AI boosterism

2025-12-24

Recent News

Artificial Intelligence at Samsung – Two Use Cases

Artificial Intelligence at Samsung – Two Use Cases

2025-12-24
The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

2025-12-24
China just carried out its second reusable launch attempt in three weeks

China just carried out its second reusable launch attempt in three weeks

2025-12-24
How social media encourages the worst of AI boosterism

How social media encourages the worst of AI boosterism

2025-12-24
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Artificial Intelligence at Samsung – Two Use Cases

Artificial Intelligence at Samsung – Two Use Cases

2025-12-24
The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

The Machine Learning “Advent Calendar” Day 23: 1D CNN for Text in Excel

2025-12-24
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.