• About
  • Advertise
  • Privacy & Policy
  • Contact
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

AiNEWS2025 by AiNEWS2025
2025-12-23
in Machine Learning
0
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


of this series, we will talk about deep learning.

And when people talk about deep learning, we immediately think of these images of deep neural networks architectures, with many layers, neurons, and parameters.

In practice, the real shift introduced by deep learning is elsewhere.

It is about learning data representations.

In this article, we focus on text embeddings, explain their role in the machine learning landscape, and show how they can be understood and explored in Excel.

1. Classic Machine earning vs. Deep learning

We will discuss, in this part, why embedding is introduced.

1.1 Where does deep learning fit?

To understand embeddings, we first need to clarify the place of deep learning.

We will use the term classic machine learning to describe methods that do not rely on deep architectures.

All the previous articles deal with classic machine learning, that can be described in two complementary ways.

Learning paradigms

  • Supervised learning
  • Unsupervised learning

Model families

  • Distance-based models
  • Tree-based models
  • Weight-based models

Across this series, we have already studied the learning algorithms behind these models. In particular, we have seen that gradient descent applies to all weight-based models, from linear regression to neural networks.

Deep learning is often reduced to neural networks with many layers.

But this explanation is incomplete.

From an optimization point of view, deep learning does not introduce a new learning rule.

So what does it introduce?

1.2 Deep learning as data representation learning

Deep learning is about how features are created.

Instead of manually designing features, deep learning learns representations automatically, often through multiple successive transformations.

This also raises an important conceptual question:

Where is the boundary between feature engineering and model learning?

Some examples make this clearer:

  • Polynomial regression is still a linear model, but the features are polynomial
  • Kernel methods project data into a high-dimensional feature space
  • Density-based methods implicitly transform the data before learning

Deep learning continues this idea, but at scale.

From this perspective, deep learning belongs to:

  • the feature engineering philosophy, for representation
  • the weight-based model family, for learning

1.3 Images and convolutional neural networks

Images are represented as pixels.

From a technical point of view, image data is already numerical and structured: a grid of numbers. However, the information contained in these pixels is not structured in a way that classical models can easily exploit.

Pixels do not explicitly encode: edges, shapes, textures, or objects.

Convolutional Neural Networks (CNNs) are designed to create information from pixels. They apply filters to detect local patterns, then progressively combine them into higher-level representations.

I have published a this article showing how CNNs can be implemented in Excel to make this process explicit.

CNN in Excel – all images by author

For images, the challenge is not to make the data numerical, but to extract meaningful representations from already numerical data.

1.4 Text data: a different problem

Text presents a fundamentally different challenge.

Unlike images, text is not numerical by nature.

Before modeling context or order, the first problem is more basic:

How do we represent words numerically?

Creating a numerical representation for text the first step.

In deep learning for text, this step is handled by embeddings.

Embeddings transform discrete symbols (words) into vectors that models can work with. Once embeddings exist, we can then model: context, order and relationships between words.

In this article, we focus on this first and essential step:
how embeddings create numerical representations for text, and how this process can be explored in Excel.

2. Two ways to learn text embeddings

In this article, we will use the IMDB movie reviews dataset to illustrate both approaches. The dataset is distributed under the Apache License 2.0.

There are two main ways to learn embeddings for text, and we will do both with this dataset:

  • supervised: we will create embeddings to predict the sentiment
  • unsupervised or self-supervised: we will use word2vec algorithm

In both cases, the goal is the same:
to transform words into numerical vectors that can be used by machine learning models.

Before comparing these two approaches, we first need to clarify what embeddings are and how they relate to classic machine learning.

IMDB dataset image by author –Apache License 2.0

2.1 Embeddings and classic machine learning

In classic machine learning, categorical data is usually handled with:

  • label encoding, which assigns fixed integers but introduces artificial order
  • one-hot encoding, which removes order but produces high-dimensional sparse vectors

How they can be used depend on the nature of the models.

Distance-based models cannot effectively use one-hot encoding, because all categories end up being equally distant from each other. Label encoding could work only if we can attribute meaningful numerical values for the categories, which is generally not the case in classic models.

Weight-based models can use one-hot encoding, because the model learns a weight for each category. In contrast, with label encoding, the numerical values are fixed and cannot be adjusted to represent meaningful relationships.

Tree-based models treat all variables as categorical splits rather than numerical magnitudes, which makes label encoding acceptable in practice. However, most implementations, including scikit-learn, still require numerical inputs. As a result, categories must be converted to numbers, either through label encoding or one-hot encoding. If the numerical values carried semantic meaning, this would again be beneficial.

Overall, this highlights a limitation of classic approaches:
category values are fixed and not learned.

Embeddings extend this idea by learning the representation itself.
Each word is associated with a trainable vector, turning the representation of categories into a learning problem rather than a preprocessing step.

2.2 Supervised embeddings

In supervised learning, embeddings are learned as part of a prediction task.

For example, the IMDB dataset has labels about the in sentiment analysis. So we can create a very simple architecture:

In our case, we can use a very simple architecture: each word is mapped to a one-dimensional embedding

This is possible because the objective is binary sentiment classification.

Once training is complete, we can export the embeddings and explore them in Excel.

When plotting the embeddings on the x-axis and word frequency on the y-axis, a clear pattern appears:

  • positive values are associated with words such as excellent or wonderful,
  • negative values are associated with words such as worst or waste

Depending on the initialization, the sign can be inverted, since the logistic regression layer also has parameters that influence the final prediction.

Finally, in Excel, we reconstruct the full pipeline that corresponds to the architecture we define early.

Input column
The input text (a review) is cut into words, and each row corresponds to one word.

Embedding search
Using a lookup function, the embedding value associated with each word is retrieved from the embedding table learned during training.

Global average
The global average embedding is computed by averaging the embeddings of all words seen so far. This corresponds to a very simple sentence representation: the mean of word vectors.

Probability prediction
The averaged embedding is then passed through a logistic function to produce a sentiment probability.

What we observe

  • Words with strongly positive embeddings (for example excellent, love, fun) push the average upward.
  • Words with strongly negative embeddings (for example worst, horrible, waste) pull the average downward.
  • Neutral or weakly weighted words have little influence.

As more words are added, the global average embedding stabilizes, and the sentiment prediction becomes more confident.

2.3 Word2Vec: embeddings from co-occurrence

In Word2Vec, similarity does not mean that two words have the same meaning.
It means that they appear in similar contexts.

Word2Vec learns word embeddings by looking at which words tend to co-occur within a fixed window in the text. Two words are considered similar if they often appear around the same neighboring words, even if their meanings are opposite.

As shown in the Excel sheet below, we compute the cosine similarity for the word good and retrieve the most similar words.

From the model’s perspective, the surrounding words are almost identical. The only thing that changes is the adjective itself.

As a result, Word2Vec learns that “good” and “bad” play a similar role in language, even though their meanings are opposite.

So, Word2Vec captures distributional similarity, not semantic polarity.

A useful way to think about it is:

Words are close if they are used in the same places.

2.4 How embeddings are used

In modern systems such as RAG (Retrieval-Augmented Generation), embeddings are often used to retrieve documents or passages for question answering.

However, this approach has limitations.

Most commonly used embeddings are trained in a self-supervised way, based on co-occurrence or contextual prediction objectives. As a result, they capture general language similarity, not task-specific meaning.

This means that:

  • embeddings may retrieve text that is linguistically similar but not relevant
  • semantic proximity does not guarantee answer correctness

Other embedding strategies can be used, including task-adapted or supervised embeddings, but they often remain self-supervised at their core.

Understanding how embeddings are created, what they encode, and what they do not encode is therefore essential before using them in downstream systems such as RAG.

Conclusion

Embeddings are learned numerical representations of words that make similarity measurable.

Whether learned through supervision or through co-occurrence, embeddings map words to vectors based on how they are used in data. By exporting them to Excel, we can inspect these representations directly, compute similarities, and understand what they capture and what they do not.

This makes embeddings less mysterious and clarifies their role as a foundation for more complex systems such as retrieval or RAG.

Source link

#Machine #Learning #Advent #Calendar #Day #Embeddings #Excel

Tags: artificial intelligencedata sciencedeep learningText Embedding
Previous Post

In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

Next Post

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

AiNEWS2025

AiNEWS2025

Next Post
Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody's and Prudential Insurance

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

2025-12-23
The FCC’s foreign drone ban is here

The FCC’s foreign drone ban is here

2025-12-23

Recent News

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

2025-12-23
The FCC’s foreign drone ban is here

The FCC’s foreign drone ban is here

2025-12-23
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.