• About
  • Advertise
  • Privacy & Policy
  • Contact
Thursday, December 25, 2025
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

AiNEWS2025 by AiNEWS2025
2025-12-25
in Machine Learning
0
The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


of my Machine Learning Advent Calendar.

Before closing this series, I would like to sincerely thank everyone who followed it, shared feedback, and supported it, in particular the Towards Data Science team.

Ending this calendar with Transformers is not a coincidence. The Transformer is not just a fancy name. It is the backbone of modern Large Language Models.

There is a lot to say about RNNs, LSTMs, and GRUs. They played a key historical role in sequence modeling. But today, modern LLMs are overwhelmingly based on Transformers.

The name Transformer itself marks a rupture. From a naming perspective, the authors could have chosen something like Attention Neural Networks, in line with Recurrent Neural Networks or Convolutional Neural Networks. As a Cartesian mind, I would have appreciated a more consistent naming structure. But naming aside, the conceptual shift introduced by Transformers fully justifies the distinction.

Transformers can be used in different ways. Encoder architectures are commonly used for classification. Decoder architectures are used for next-token prediction, so for text generation.

In this article, we will focus on one core idea only: how the attention matrix transforms input embeddings into something more meaningful.

In the previous article, we introduced 1D Convolutional Neural Networks for text. We saw that a CNN scans a sentence using small windows and reacts when it recognizes local patterns. This approach is already very powerful, but it has a clear limitation: a CNN only looks locally.

Today, we move one step further.

The Transformer answers a fundamentally different question.

What if every word could look at all the other words at once?

1. The same word in two different contexts

To understand why attention is needed, we will start with a simple idea.

We will use two different input sentences, both containing the word mouse, but used in different contexts.

In the first input, mouse appears in a sentence with cat. In the second input, mouse appears in a sentence with keyboard.

Transformers in Excel – all images by author

At the input level, we deliberately use the same embedding for the word “mouse” in both cases. This is important. At this stage, the model does not know which meaning is intended.

The embedding for mouse contains both:

  • a strong animal component
  • a strong tech component

This ambiguity is intentional. Without context, mouse could refer to an animal or to a computer device.

All other words provide clearer signals. Cat is strongly animal. Keyboard is strongly tech. Words like and or are mainly carry grammatical information. Words like friends and useful are weakly informative on their own.

At this point, nothing in the input embeddings allows the model to decide which meaning of mouse is correct.

In the next chapter, we will see how the attention matrix performs this transformation, step by step.

2. Self-attention: how context is injected into embeddings

2.1 Self-attention, not just attention

We first clarify what kind of attention we are using here. This chapter focuses on self-attention.

Self-attention means that each word looks at the other words of the same input sequence.

In this simplified example, we make an additional pedagogical choice. We assume that Queries and Keys are directly equal to the input embeddings. In other words, there are no learned weight matrices for Q and K in this chapter.

This is a deliberate simplification. It allows us to focus entirely on the attention mechanism, without introducing extra parameters. Similarity between words is computed directly from their embeddings.

Conceptually, this means:
Q = Input
K = Input

Only the Value vectors are used later to propagate information to the output.

In real Transformer models, Q, K, and V are all obtained through learned linear projections. Those projections add flexibility, but they do not change the logic of attention itself. The simplified version shown here captures the core idea.

Here is the whole picture that we will decompose.

2.2 From input embeddings to raw attention scores

We start from the input embedding matrix, where each row corresponds to a word and each column corresponds to a semantic dimension.

The first operation is to compare every word with every other word. This is done by computing dot products between Queries and Keys.

Because Queries and Keys are equal to the input embeddings in this example, this step reduces to computing dot products between input vectors.

All dot products are computed at once using a matrix multiplication:
Scores = Input × Inputᵀ

Each cell of this matrix answers a simple question: how similar are these two words, given their embeddings?

At this stage, the values are raw scores. They are not probabilities, and they do not yet have a direct interpretation as weights.

2.3 Scaling and normalization

Raw dot products can grow large as the embedding dimension increases. To keep values in a stable range, the scores are scaled by the square root of the embedding dimension.

ScaledScores = Scores / √d

This scaling step is not conceptually deep, but it is practically important. It prevents the next step, the softmax, from becoming too sharp.

Once scaled, a softmax is applied row by row. This converts raw scores into positive values that sum to one.

The result is the attention matrix.

And attention is all you need.

Each row of this matrix describes how much attention a given word pays to every other word in the sentence.

2.4 Interpreting the attention matrix

The attention matrix is the central object of self-attention.

For a given word, its row in the attention matrix answers the question: when updating this word, which other words matter, and how much?

For example, the row corresponding to mouse assigns higher weights to words that are semantically related in the current context. In the sentence with cat and friends, mouse attends more to animal-related words. In the sentence with keyboard and useful, it attends more to technical words.

The mechanism is identical in both cases. Only the surrounding words change the outcome.

2.5 From attention weights to output embeddings

The attention matrix itself is not the final result. It is a set of weights.

To produce the output embeddings, we combine these weights with the Value vectors.

Output = Attention × V

In this simplified example, the Value vectors are taken directly from the input embeddings. Each output word vector is therefore a weighted average of the input vectors, with weights given by the corresponding row of the attention matrix.

For a word like mouse, this means that its final representation becomes a mixture of:

  • its own embedding
  • the embeddings of the words it attends to most

This is the precise moment where context is injected into the representation.

At the end of self-attention, the embeddings are no longer ambiguous.

The word mouse no longer has the same representation in both sentences. Its output vector reflects its context. In one case, it behaves like an animal. In the other, it behaves like a technical object.

Nothing in the embedding table changed. What changed is how information was combined across words.

This is the core idea of self-attention, and the foundation on which Transformer models are built.

If we now compare the two examples, cat and mouse on the left and keyboard and mouse on the right, the effect of self-attention becomes explicit.

In both cases, the input embedding of mouse is identical. Yet the final representation differs. In the sentence with cat, the output embedding of mouse is dominated by the animal dimension. In the sentence with keyboard, the technical dimension becomes more prominent. Nothing in the embedding table changed. The difference comes entirely from how attention redistributed weights across words before mixing the values.

This comparison highlights the role of self-attention: it does not change words in isolation, but reshapes their representations by taking the full context into account.

3. Learning how to mix information

Transformers in Excel – all images by author

3.1 Introducing learned weights for Q, K, and V

Until now, we have focused on the mechanics of self-attention itself. We now introduce an important element: learned weights.

In a real Transformer, Queries, Keys, and Values are not taken directly from the input embeddings. Instead, they are produced by learned linear transformations.

For each word embedding, the model computes:
Q = Input × W_Q
K = Input × W_K
V = Input × W_V

These weight matrices are learned during training.

At this stage, we usually keep the same dimensionality. The input embeddings, Q, K, V, and the output embeddings all have the same number of dimensions. This makes the role of attention easier to understand: it modifies representations without changing the space they live in.

Conceptually, these weights allow the model to decide:

  • which aspects of a word matter for comparison (Q and K)
  • which aspects of a word should be transmitted to others (V)

3.2 What the model actually learns

The attention mechanism itself is fixed. Dot products, scaling, softmax, and matrix multiplications always work the same way. What the model actually learns are the projections.

By adjusting the Q and K weights, the model learns how to measure relationships between words for a given task. By adjusting the V weights, it learns what information should be propagated when attention is high. The structure defines how information flows, while the weights define what information flows.

Because the attention matrix depends on Q and K, it is partially interpretable. We can inspect which words attend to which others and observe patterns that often align with syntax or semantics.

This becomes clear when comparing the same word in two different contexts. In both examples, the word mouse starts with exactly the same input embedding, containing both an animal and a tech component. On its own, it is ambiguous.

What changes is not the word, but the attention it receives. In the sentence with cat and friends, attention emphasizes animal-related words. In the sentence with keyboard and useful, attention shifts toward technical words. The mechanism and the weights are identical in both cases, yet the output embeddings differ. The difference comes entirely from how the learned projections interact with the surrounding context.

This is precisely why the attention matrix is interpretable: it reveals which relationships the model has learned to consider meaningful for the task.

3.3 Changing the dimensionality on purpose

Nothing, however, forces Q, K, and V to have the same dimensionality as the input.

The Value projection, in particular, can map embeddings into a space of a different size. When this happens, the output embeddings inherit the dimensionality of the Value vectors.

This is not a theoretical curiosity. It is exactly what happens in real models, especially in multi-head attention. Each head operates in its own subspace, often with a smaller dimension, and the results are later concatenated into a larger representation.

So attention can do two things:

  • mix information across words
  • reshape the space in which this information lives

This explains why Transformers scale so well.

They do not rely on fixed features. They learn:

  • how to compare words
  • how to route information
  • how to project meaning into different spaces

The attention matrix controls where information flows.
The learned projections control what information flows and how it is represented.

Together, they form the core mechanism behind modern language models.

Conclusion

This Advent Calendar was built around a simple idea: understanding machine learning models by looking at how they actually transform data.

Transformers are a fitting way to close this journey. They do not rely on fixed rules or local patterns, but on learned relationships between all elements of a sequence. Through attention, they turn static embeddings into contextual representations, which is the foundation of modern language models.

Thank you again to everyone who followed this series, shared feedback, and supported it, especially the Towards Data Science team.

Merry Christmas 🎄


All the Excel files are available through this Kofi link. Your support means a lot to me. The price will increase during the month, so early supporters get the best value.

This image has an empty alt attribute; its file name is image-205-1024x348.png

Source link

#Machine #Learning #Advent #Calendar #Day #Transformers #Text #Excel

Tags: artificial intelligencedata scienceDeep Divesdeep learningTransformer
Previous Post

Being Santa Claus is a year-round calling

Next Post

Meet the man hunting the spies in your smartphone

AiNEWS2025

AiNEWS2025

Next Post
Meet the man hunting the spies in your smartphone

Meet the man hunting the spies in your smartphone

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
The science of human touch – and why it’s so hard to replicate in robots

The science of human touch – and why it’s so hard to replicate in robots

2025-12-25
Meet the man hunting the spies in your smartphone

Meet the man hunting the spies in your smartphone

2025-12-25
The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

2025-12-25
Being Santa Claus is a year-round calling

Being Santa Claus is a year-round calling

2025-12-25

Recent News

The science of human touch – and why it’s so hard to replicate in robots

The science of human touch – and why it’s so hard to replicate in robots

2025-12-25
Meet the man hunting the spies in your smartphone

Meet the man hunting the spies in your smartphone

2025-12-25
The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

The Machine Learning “Advent Calendar” Day 24: Transformers for Text in Excel

2025-12-25
Being Santa Claus is a year-round calling

Being Santa Claus is a year-round calling

2025-12-25
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

The science of human touch – and why it’s so hard to replicate in robots

The science of human touch – and why it’s so hard to replicate in robots

2025-12-25
Meet the man hunting the spies in your smartphone

Meet the man hunting the spies in your smartphone

2025-12-25
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.