The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

with Decision Trees, both for Regression and Classification, we will continue to use the principle of Decision Trees today.

And this time, we are in unsupervised learning, so there are no labels.

The algorithm is called Isolation Forest, and the idea is to build many decision trees to form a forest. The principle is to detect anomalies by isolating them.

To keep everything easy to understand, let’s take a very simple example dataset that I created myself:

1, 2, 3, 9

(And because the chief editor of TDS reminded me about legal details about mentioning the source of the data, let me state this properly: this dataset is fully copyrighted by myself. It is a four-point dataset that I handcrafted, and I am happy to grant everyone the right to use it for educational purposes.)

The goal here is simple: find the anomaly, the intruder.

I know you already see which one it is.

As always, the idea is to turn this into an algorithm that can detect it automatically.

Anomaly Detection in the Classic ML Framework

Before going further, let us take one step back and see where anomaly detection sits in the bigger picture.

On the left, we have supervised learning, with labeled data and two main types:

Regression when the target is numerical
Classification when the target is categorical

This is where we used Decision Trees so far.

On the right, we have unsupervised learning, with no labels.

We do not predict anything. We simply manipulate the observations (clustering and anomaly detection) or manipulate the features (dimensionality reduction, and other methods).

Dimensionality reduction manipulates the features. Even though it sits in the “unsupervised” category, its goal is quite different from the others. Since it reshapes the features themselves, it almost feels like feature engineering.

For observation-level methods, we have two possibilities:

Clustering: group observations
Anomaly detection: assign a score to each observation

In practice, some models can do the two at the same time. For example, the k-means is capable of detecting anomalies.

Isolation Forest is only for Anomaly Detection, and not clustering.

So, today, we are exactly here:
Unsupervised learning → Clustering / Anomaly detection → Anomaly detection

The Painful Part: Building Trees in Excel

Now we begin the implementation in Excel, and I have to be honest: this part is really painful…

It is painful because we need to build many small rules, and the formulas are not easy to drag. This is one of the limitations of Excel when the model is based on decisions. Excel is great when the formulas look the same for every row. But here, each node in the tree follows a different rule, so the formulas do not generalize easily.

For Decision Trees, we saw that with a single split, the formula worked. But I stopped there on purpose. Why? Because adding more splits in Excel becomes complicated. The structure of a decision tree is not naturally “drag-friendly”.

However, for Isolation Forest, we have no choice.

We need to build a full tree, all the way down, to see how each point is isolated.

If you, dear readers, have ideas to simplify this, please contact me.

Isolation Forest in 3 Steps

Even though the formulas are not easy, I tried my best to structure the approach. Here is the entire method in just three steps.

Isolation Forest in Excel – image by author

1. Isolation Tree Construction

We start by creating one isolation tree.

At each node, we pick a random split value between the minimum and maximum of the current group.

This split divides the observations into “left” (L) and “right” (R).

When an observation becomes isolated, I mark it as F for “Final”, meaning it has reached a leaf.

By repeating this process, we obtain a full binary tree where anomalies tend to be isolated in fewer steps. For each observation, we can then count its depth, which is simply the number of splits needed to isolate it.

2. Average Depth Calculation

One tree is not enough. So we repeat the same random process several times to build multiple trees.

For each data point, we count how many splits were needed to isolate it in each tree.

Then we compute the average depth (or average path length) across all trees.

This gives a stable and meaningful measure of how easy it is to isolate each point.

At this point, the average depth already gives us a solid indicator:
the lower the depth, the more likely the point is an anomaly.

A short depth means the point is isolated very quickly, which is a signature of an anomaly.

A long depth means the point behaves like the rest of the data, because they stay grouped together, and are not easy to separate.

In our example, the score makes perfect sense.

First, 9 is the anomaly, with the average depth of 1. For all 5 trees, one split is enough to isolate it. (Although, this is not always the case, you can test it yourself.)
For the other three observations, the depth is similar, and noticeably larger. And the highest score is attributed to 2, which sits in the middle of the group, and this is exactly what we expect.

If one day you have to explain this algorithm to someone else, feel free to use this dataset: easy to remember and intuitive to illustrate. And please, do not forget to mention my copyright on it!

3. Anomaly Score Calculation

The final step is to normalize the average depth, to give a standard anomaly score, between 0 and 1.

Saying that an observation has an average depth of n does not mean much by itself.

This value depends on the total number of data points, so we cannot interpret it directly as “normal” or “anomalous”.

The idea is to compare the average path length of each point to a typical value expected under pure randomness. This tells us how surprising (or not) the depth really is.

We will see the transformation later, but the goal is simple:
turn the raw depth into a relative score that makes sense without any context.

Short depths will naturally become scores close to 1 (anomalies),
and long depths will become scores close to 0 (normal observations).

And finally, some implementations adjust the score so that it has a different meaning: positive values indicate normal points, and negative values indicate anomalies. This is simply a transformation of the original anomaly score.

The underlying logic does not change at all: short paths still correspond to anomalies, and long paths correspond to normal observations.

Isolation Tree Building

So this is the painful part.

Quick Overview

I created a table to capture the different steps of the tree-building process.

It is not regular, and it is not perfectly structured, but I tried my best to make it readable.

And I am not sure that all the formulas generalized well.

Get the minimum and maximum values of the current group.
Generate a random split value between this min and max.
Split the observations into left (L) and right (R).
Count how many observations fall into L and R.
If a group contains only one observation, mark it as F (Final) and stop for that branch.
Repeat the process for every non-final group until all observations are isolated.

This is the entire logic of building one isolation tree.

Developed Explanation

We begin with all the observations together.

The first step is to look at the minimum and maximum of this group. These two values define the interval where we can make a random cut.

Next, we generate a random split value somewhere between the min and max. Unlike decision trees, there is no optimization, no criterion, no impurity measure. The split is purely random.

We can use RAND in Excel, as you can see the in following screenshot.

Once we have the random split, we divide the data into two groups:

Left (L): observations less than or equal to the split
Right (R): observations greater than the split

This is simply done by comparing the split with the observations with IF formula.

After the split, we count how many observations went to each side.
If one of these groups contains only one observation, this point is now isolated.

We mark it as F for “Final”, meaning it sits in a leaf and no further splitting is needed for that branch.

The VLOOKUP is to get the observations that have 1 on its side, from the table of the counts.

For all other groups that still contain multiple observations, we repeat exactly the same process.

We stop only when every observation is isolated, meaning each one appears in its own final leaf. The full structure that emerges is a binary tree, and the number of splits needed to isolate each observation is its depth.

Here, we know that 3 splits are enough.

At the end, you get the final table of one fully grown isolation tree.

Anomaly Score Calculation

The part about averaging the depth is just repeating the same process, and you can copy paste.

Now, I will give more details about the anomaly score calculation.

Normalization factor

To compute the anomaly score, Isolation Forest first needs a normalizing factor called c(n).

This value represents the expected depth of a random point in a random binary search tree with n observations.

Why do we need it?

Because we want to compare the actual depth of a point to the typical depth expected under randomness.

A point that is isolated much faster than expected is likely an anomaly.

The formula for c(n) uses harmonic numbers.
A harmonic number H(k) is approximately:

where γ = 0.5772156649 is the Euler–Mascheroni constant.

Using this approximation, the normalizing factor becomes:

Then we can calculate this number in Excel.

Once we have c(n), the anomaly score is:

where h(x) is the average depth needed to isolate the point across all trees.

If the score is close to 0, the point is normal

If the score is close to 1, the point is an anomaly

So we can transform the depths into scores.

Finally, for the adjusted score, we can use an offset, that is the average value of the anomaly scores, and we translate.

Additional Elements in Real Algorithm

In practice, Isolation Forest includes a few extra steps that make it more robust.

1. Choose a subsample of the data
Instead of using the full dataset for every tree, the algorithm picks a small random subset.

This reduces computation and adds diversity between trees.
It also helps prevent the model from being overwhelmed by very large datasets.

So it seems that a name like “Random Isolation Forest” is more suitable, right?

2. Pick a random feature first
When building each split, Isolation Forest does not always use the same feature.

It first selects a feature at random, then chooses a random split value within that feature.

This makes the trees even more diverse and helps the model work well on datasets with many variables.

These simple additions make Isolation Forest surprisingly powerful for real-world applications.

This is again what a “Random Isolation Forest” would do, this name is definitely better!

Advantages of Isolation Forest

Compared with many distance-based models, Isolation Forest has several important advantages:

Works with categorical features
Distance-based methods struggle with categories, but Isolation Forest can handle them more naturally.
Handles many features easily
High-dimensional data is not a problem.
The algorithm does not rely on distance metrics that break in high dimensions.
No assumptions about distributions
There is no need for normality, no density estimation, no distances to compute.
Scales well to high dimensions
Its performance does not collapse when the number of features grows.
Very fast
Splitting is trivial: pick a feature, pick a random value, cut.
No optimization step, no gradient, no impurity calculation.

Isolation Forest also has a very refreshing way of thinking:

Instead of asking “What should normal points look like?”,
Isolation Forest asks, “How fast can I isolate this point?”

This simple change of perspective solves many difficulties of classical anomaly detection.

Conclusion

Isolation Forest is an algorithm that looks complicated from the outside, but once you break it down, the logic is actually very simple.

The Excel implementation is painful, yes. But the idea is not.
And once you understand the idea, everything else becomes much easier: how the trees work, why the depth matters, how the score is computed, and why the algorithm works so well in practice.

Isolation Forest does not try to model “normal” behavior. Instead, it asks a completely different question: how fast can I isolate this observation?

This small change of perspective solves many problems that distance-based or density-based models struggle with.

Source link

#Machine #Learning #Advent #Calendar #Day #Isolation #Forest #Excel