The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel

Today’s model is Logistic Regression.

If you already know this model, here is a question for you:

Is Logistic Regression a regressor or a classifier?

Well, this question is exactly like: Is a tomato a fruit or a vegetable?

From a botanist’s viewpoint, a tomato is a fruit, because they look at structure: seeds, flowers, plant biology.

From a cook’s viewpoint, a tomato is a vegetable, because they look at taste, how it is used in a recipe, whether it goes in a salad or a dessert.

The same object, two valid answers, because the point of view is different.

Logistic Regression is exactly like that.

In the Statistical / GLM perspective, it is a regression. And there is not the concept of “classification” in this framework anyway. There are gamma regression, logistic regression, Poisson regression…
In the machine learning perspective, it is used for classification. So it is a classifier.

We will come back to this later.

For now, one thing is sure:

Logistic Regression is very well adapted when the target variable is binary, and usually y is coded as 0 or 1.

But…

What is a classifier for a weight-based model?

So, y can be 0 or 1.

0 or 1, they are numbers, right?

So we can just consider y as continuous!

Yes, y = a x + b, with y = 0 or 1.

Why not?

Now, you may ask: why this question, now? Why it was not asked before.

Well, for distance-based and tree-based models, a categorical y is truly categorical.

When y is categorical, like red, blue, green, or simply 0 and 1:

In K-NN, you classify by looking at neighbors of each class.
In centroid models, you compare with the centroid of each class.
In a decision tree, you compute class proportions at each node.

In all these models:

Class labels are not numbers.
They are categories.
The algorithms never treat them as values.

So classification is natural and immediate.

But for weight-based models, things work differently.

In a weight-based model, we always compute something like:

y = a x + b

or, later, a more complex function with coefficients.

This means:

The model works with numbers everywhere.

So here is the key idea:

If the model does regression, then this same model can be used for binary classification.

Yes, we can use linear regression for binary classification!

Since binary labels are 0 and 1, they are already numeric.

And in this special case: we can apply Ordinary Least Squares (OLS) directly on y = 0 and y = 1.

The model will fit a line, and we can use the same closed-form formula, as we can see below.

Logistic Regression in Excel – all images by author

We can do the same gradient descent, and it will perfectly work:

And then, to obtain the final class prediction, we simply choose a threshold.
It is usually 0.5 (or 50 percent), but depending on how strict you want to be, you can pick another value.

If the predicted y≥0.5, predict class 1
Otherwise, class 0

This is a classifier.

And because the model produces a numeric output, we can even identify the point where: y=0.5.

This value of x defines the decision frontier.

In the previous example, this happens at x=9.
At this threshold, we already saw one misclassification.

But a problem appears as soon as we introduce a point with a large value of x.

For example, suppose we add a point with: x= 50 and y = 1.

Because linear regression tries to fit a straight line through all the data, this single large value of x pulls the line upward.
The decision frontier shifts from x= to approximately x=12.

And now, with this new boundary, we end up with two misclassifications.

This illustrates the main issue:

A linear regression used as a classifier is extremely sensitive to extreme values of x. The decision frontier moves dramatically, and the classification becomes unstable.

This is one of the reasons we need a model that does not behave linearly forever. A model that stays between 0 and 1, even when x becomes very large.

And this is exactly what the logistic function will give us.

How Logistic Regression works

We start with : ax + b, just like the linear regression.

Then we apply one function called sigmoid, or logistic function.

As we can see in the screenshot below, the value of p is then between 0 and 1, so this is perfect.

p(x) is the predicted probability that y = 1
1 − p(x) is the predicted probability that y = 0

For classification, we can simply say:

If p(x) ≥ 0.5, predict class 1
Otherwise, predict class 0

From likelihood to log-loss

Now, the OLS Linear Regression tries to minimize the MSE (Mean Squared Error).

Logistic regression for a binary target uses the Bernoulli likelihood. For each observation i:

If yᵢ = 1, the probability of the data point is pᵢ
If yᵢ = 0, the probability of the data point is 1 − pᵢ

For the whole dataset, the likelihood is the product over all i. In practice, we take the logarithm, which turns the product into a sum.

In the GLM perspective, we try to maximize this log likelihood.

In the machine learning perspective, we define the loss as the negative log likelihood and we minimize it. This gives the usual log-loss.

And it is equivalent. We will not do the demonstration here

Gradient Descent for Logistic Regression

Principle

Just as we did for Linear Regression, we can also use Gradient Descent here. The idea is always the same:

Start from some initial values of a and b.
Compute the loss and its gradient (derivatives) with respect to a and b.
Move a and b a little bit in the direction that reduces the loss.
Repeat.

Nothing mysterious.
Just the same mechanical process as before.

Step 1. Gradient Calculation

For logistic regression, the gradients of the average log-loss follow a very simple structure.

This is simply the average residual.

We will just give the result below, for the formula that we can implement in Excel. As you can see, it is quite simple at the end, even if the log-loss formula can be complex at first glance.

Excel can compute these two quantities with straightforward SUMPRODUCT formulas.

Step 2. Parameter Update

Once the gradients are known, we update the parameters.

This update step is repeated at each iteration.
And iteration after iteration, the loss goes down, and the parameters converge to the optimal values.

We now have the whole picture.
You have seen the model, the loss, the gradients, and the parameter updates.
And with the detailed view of each iteration in Excel, you can actually play with the model: change a value, watch the curve move, and see the loss decrease step by step.

It is surprisingly satisfying to observe how everything fits together so clearly.

What about multiclass classification?

For distance-based and tree-based models:

No issue at all.
They naturally handle multiple classes because they never interpret the labels as numbers.

But for weight-based models?

Here we hit a problem.

If we write numbers for the class: 1, 2, 3, etc.

Then the model will interpret these numbers as real numeric values.
Which leads to problems:

the model thinks class 3 is “bigger” than class 1
the midpoint between class 1 and class 3 is class 2
distances between classes become meaningful

But none of this is true in classification.

So:

For weight-based models, we cannot just use y = 1, 2, 3 for multiclass classification.

This encoding is incorrect.

We will see later how to fix this.

Conclusion

Starting from a simple binary dataset, we saw how a weight-based model can act as a classifier, why linear regression quickly reaches its limits, and how the logistic function solves these problems by keeping predictions between 0 and 1.

Then, by expressing the model through likelihood and log-loss, we obtained a formulation that is both mathematically sound and easy to implement.
And once everything is placed in Excel, the entire learning process becomes visible: the probabilities, the loss, the gradients, the updates, and finally the convergence of the parameters.

With the detailed iteration table, you can actually see how the model improves step by step.
You can change a value, adjust the learning rate, or add a point, and instantly observe how the curve and the loss react.
This is the real value of doing machine learning in a spreadsheet: nothing is hidden, and every calculation is transparent.

By building logistic regression this way, you not only understand the model, you understand why it is trained.
And this intuition will stay with you as we move to more advanced models later in the Advent Calendar.

Source link

#Machine #Learning #Advent #Calendar #Day #Logistic #Regression #Excel