• About
  • Advertise
  • Privacy & Policy
  • Contact
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

The Machine Learning “Advent Calendar” Day 2: k-NN Classifier in Excel

AiNEWS2025 by AiNEWS2025
2025-12-03
in Machine Learning
0
The Machine Learning “Advent Calendar” Day 2: k-NN Classifier in Excel
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


the k-NN Regressor and the idea of prediction based on distance, we now look at the k-NN Classifier.

The principle is the same, but classification allows us to introduce several useful variants, such as Radius Nearest Neighbors, Nearest Centroid, multi-class prediction, and probabilistic distance models.

So we will first implement the k-NN classifier, then discuss how it can be improved.

You can use this Excel/Google sheet while reading this article to better follow all the explanations.

k-NN classifier in Excel – image by author

Titanic survival dataset

We will use the Titanic survival dataset, a classic example where each row describes a passenger with features such as class, sex, age, and fare, and the goal is to predict whether the passenger survived.

Titanic survival dataset – image by author – CC0: Public Domain license

Principle of k-NN for Classification

k-NN classifier is so similar to k-NN regressor that I could almost write one single article to explain them both.

In fact, when we look for the k nearest neighbors, we do not use the value y at all, let alone its nature.

BUT, there are still some interesting facts about how classifiers (binary or multi-class) are built, and how the features can be handled differently.

We begin with the binary classification task, and then the multi-class classification.

One Continuous Feature for Binary Classification

So, very quick, we can do the same exercise for one continuous feature, with this dataset.

For the value of y, we usually use 0 and 1 to distinguish the two classes. But you can notice, or you will notice that it can be a source of confusion.

k-NN classifier in Excel – One continuous feature – image by author

Now, think about it: 0 and 1 are also numbers, right? So, we can exactly do the same process as if we are doing a regression.

That’s right. Nothing changes in the computation, as you see in the following screenshot. And you can of course try to modify the value of the new observation yourself.

k-NN classifier in Excel – prediction for one continuous feature – image by author

The only difference is how we interpret the result. When we take the “average” of the neighbors’ y values, this number is understood as the probability that the new observation belongs to class 1.

So in reality, the “average” value is not the good interpretation, but it is rather the proportion of class 1.

We can also manually create this plot, to show how the predicted probability changes over a range of x values.

Traditionally, to avoid ending up with a 50 percent probability, we choose an odd value for k, so that we can always decide with majority voting.

k-NN classifier in Excel – predictions for one continuous feature – image by author

Two-feature for Binary classification

If we have two features, the operation is also almost the same as in k-NN regressor.

k-NN classifier in Excel – two continuous features – image by author

One feature for multi-class classification

Now, let’s take an example of three classes for the target variable y.

Then we can see that we cannot use the notion of “average” anymore, since the number that represents the category is not actually a number. And we should better call them “category 0”, “category 1”, and “category 2”.

k-NN classifier in Excel – multi-class classifer – image by author

From k-NN to Nearest Centroids

When k Becomes too Large

Now, let’s make k large. How large? As large as possible.

Remember, we also did this exercise with k-NN regressor, and the conclusion was that if k equals the total number of observations in the training dataset, then k-NN regressor is the simple average-value estimator.

For the k-NN classifier, it is almost the same. If k equals the total number of observations, then for each class, we will get its overall proportion inside the entire training dataset.

Some people, from a Bayesian point of view, call these proportions the priors!

But this does not help us much to classify a new observation, because these priors are the same for every point.

The Creation of Centroids

So let us take one more step.

For each class, we can also group together all the feature values x that belong to that class, and compute their average.

These averaged feature vectors are what we call centroids.

What can we do with these centroids?

We can use them to classify a new observation.

Instead of recalculating distances to the entire dataset for every new point, we simply measure the distance to each class centroid and assign the class of the nearest one.

With the Titanic survival dataset, we can start with a single feature, age, and compute the centroids for the two classes: passengers who survived and passengers who did not.

k-NN classifier in Excel – Nearest Centroids – image by author

Now, it is also possible to use multiple continuous features.

For example, we can use the two features age and fare.

k-NN classifier in Excel – Nearest Centroids – image by author

And we can discuss some important characteristics of this model:

  • The scale is important, as we discussed before for k-NN regressor.
  • The missing values are not a problem here: when we compute the centroids per class, each one is calculated with the available (non-empty) values
  • We went from the most “complex” and “large” model (in the sense that the actual model is the entire training dataset, so we have to store all the dataset) to the simplest model (we only use one value per feature, and we only store these values as our model)

From highly nonlinear to naively linear

But now, can you think of one major drawback?

Whereas the basic k-NN classifier is highly nonlinear, the Nearest Centroid method is extremely linear.

In this 1D example, the two centroids are simply the average x values of class 0 and class 1. Because these two averages are close, the decision boundary becomes just the midpoint between them.

So instead of a piecewise, jagged boundary that depends on the exact location of many training points (as in k-NN), we obtain a straight cutoff that only depends on two numbers.

This illustrates how Nearest Centroids compresses the entire dataset into a simple and very linear rule.

k-NN classifier in Excel – Nearest Centroids linearity – image by author

A note on regression: why centroids do not apply

Now, this kind of improvement is not possible for the k-NN regressor. Why?

In classification, each class forms a group of observations, so computing the average feature vector for each class makes sense, and this gives us the class centroids.

But in regression, the target y is continuous. There are no discrete groups, no class boundaries, and therefore no meaningful way to compute “the centroid of a class”.

A continuous target has infinitely many possible values, so we cannot group observations by their y value to form centroids.

The only possible “centroid” in regression would be the global mean, which corresponds to the case k = N in k-NN regressor.

And this estimator is far too simple to be useful.

In short, Nearest Centroids Classifier is a natural improvement for classification, but it has no direct equivalent in regression.

Further statistical improvements

What else can we do with the basic k-NN classifier?

Average and variance

With Nearest Centroids Classifier, we used the simplest statistic that is the average. A natural reflex in statistics is to add the variance as well.

So, now, distance is no longer Euclidean, but Mahalanobis distance. Using this distance, we get the probability based on the distribution characterized by the mean and variance of each class.

Categorical Features handling

For categorical features, we cannot compute averages or variances. And for k-NN regressor, we saw that it was possible to do one-hot encoding or ordinal/label encoding. But the scale is important and not easy to determine.

Here, we can do something equally meaningful, in terms of probabilities: we can count the proportions of each category inside a class.

These proportions act exactly like probabilities, describing how likely each category is within each class.

This idea is directly linked to models such as Categorical Naive Bayes, where classes are characterized by frequency distributions over the categories.

Weighted Distance

Another direction is to introduce weights, so that closer neighbors count more than distant ones. In scikit-learn, there is the “weights” argument that allows us to do so.

We can also switch from “k neighbors” to a fixed radius around the new observation, which leads to radius-based classifiers.

Radius Nearest Neighbors

Sometimes, we can find this following graphic to explain k-NN classifier. But actually, with a radius like this, it reflects more the idea of Radius Nearest Neighbors.

One advantage is the control of the neighborhood. It is especially interesting when we know the concrete meaning of the distance, such as the geographical distance.

Radius Nearest Neighbors classifier – image by author

But the drawback is that you have to know the radius in advance.

By the way, this notion of radius nearest neighbors is also suitable for regression.

Recap of different variants

All these small changes give different models, each one trying to improve the basic idea of comparing neighbors according to a more complex definition of distance, with a control parameter what allows us to get local neighbors, or more global characterization of neighborhood.

We will not explore all these models here. I simply cannot help myself from going a bit too far when a small variation naturally leads to another idea.

For now, consider this as an announcement of the models we will implement later this month.

Variants and improvements of k-NN classifier – image by author

Conclusion

In this article, we explored the k-NN classifier from its most basic form to several extensions.

The central idea is not really changed: a new observation is classified by looking at how similar it is to the training data.

But this simple idea can take many different shapes.

With continuous features, similarity is based on geometric distance.
With categorical features, we look instead at how often each category appears among the neighbors.

When k becomes very large, the entire dataset collapses into just a few summary statistics, which leads naturally to the Nearest Centroids Classifier.

Understanding this family of distance-based and probability-based ideas helps us see that many machine-learning models are simply different ways of answering the same question:

Which class does this new observation most resemble?

In the next articles, we will continue exploring density-based models, which can be understood as global measures of similarity between observations and classes.

Source link

#Machine #Learning #Advent #Calendar #Day #kNN #Classifier #Excel

Tags: artificial intelligenceClassifcation Modelsdata scienceKnn Algorithmmachine learning
Previous Post

More FDA drama: Top drug regulator calls it quits after 3 weeks

Next Post

University Athletic Departments Optimize Performance with AI

AiNEWS2025

AiNEWS2025

Next Post
University Athletic Departments Optimize Performance with AI

University Athletic Departments Optimize Performance with AI

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

2025-12-23
The FCC’s foreign drone ban is here

The FCC’s foreign drone ban is here

2025-12-23

Recent News

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

In a surprise announcement, Tory Bruno is out as CEO of United Launch Alliance

2025-12-23
The FCC’s foreign drone ban is here

The FCC’s foreign drone ban is here

2025-12-23
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

Scaling Auditable Agentic Workflows in Financial Services – with Leaders from Moody’s and Prudential Insurance

2025-12-23
The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

The Machine Learning “Advent Calendar” Day 22: Embeddings in Excel

2025-12-23
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.