Why Most Cross-Validation Visualizations Are Wrong (And How to Fix Them) | by Samy Baladram | Nov, 2024


MODEL VALIDATION & OPTIMIZATION

Stop using moving boxes!

You know those cross-validation diagrams in every data science tutorial? The ones showing boxes in different colors moving around to explain how we split data for training and testing? Like this one:

Have you seen that? Image by author.

I’ve seen them too — one too many times. These diagrams are common — they’ve become the go-to way to explain cross-validation. But here’s something interesting I noticed while looking at them as both a designer and data scientist.

When we look at a yellow box moving to different spots, our brain automatically sees it as one box moving around.

It’s just how our brains work — when we see something similar move to a new spot, we think it’s the same thing. (This is actually why cartoons and animations work!)

You might think the animated version is better, but now you can’t help following the blue box and starting to forget that this should represent how cross-validation works. Source: Wikipedia

But here’s the thing: In these diagrams, each box in a new position is supposed to show a different chunk of data. So while our brain naturally wants to track the boxes, we have to tell our brain, “No, no, that’s not one box moving — they’re different boxes!” It’s like we’re fighting against how our brain naturally works, just to understand what the diagram means.

Looking at this as someone who works with both design and data, I started thinking: maybe there’s a better way? What if we could show cross-validation in a way that actually works with how our brain processes information?

All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on desktop.

Cross-validation is about making sure machine learning models work well in the real world. Instead of testing a model once, we test it multiple times using different parts of our data. This helps us understand how the model will perform with new, unseen data.

Here’s what happens:

  1. We take our data
  2. Divide it into groups
  3. Use some groups for training, others for testing
  4. Repeat this process with different groupings

The goal is to get a reliable understanding of our model’s performance. That’s the core idea — simple and practical.

(Note: We’ll discuss different validation techniques and their applications in another article. For now, let’s focus on understanding the basic concept and why current visualization methods need improvement.)

Open up any machine learning tutorial, and you’ll probably see these types of diagrams:

  • Long boxes split into different sections
  • Arrows showing parts moving around
  • Different colors showing training and testing data
  • Multiple versions of the same diagram side by side
Currently, this is similar to the first image you’ll see if you look up “Cross Validation.” (Image by author)

Here are the issues with such diagram:

Not Everyone Sees Colors the Same Way

Colors create practical problems when showing data splits. Some people can’t differentiate certain colors, while others may not see colors at all. The visualization fails when printed in black and white or viewed on different screens where colors vary. Using color as the primary way to distinguish data parts means some people miss important information due to their color perception.

Not everyone see the same colors. Image by author.

Colors Make Things Harder to Remember

Another thing about colors is that it might look like they help explain things, but they actually create extra work for our brain. When we use different colors for different parts of the data, we have to actively remember what each color represents. This becomes a memory task instead of helping us understand the actual concept. The connection between colors and data splits isn’t natural or obvious — it’s something we have to learn and keep track of while trying to understand cross-validation itself.

Our brain doesn’t naturally connect colors with data splits.

These are the colors we used in the previous diagrams. Why original dataset is green? Then split into blue and red?

Too Much Information at Once

The current diagrams also suffer from information overload. They attempt to display the entire cross-validation process in a single visualization, which creates unnecessary complexity. Multiple arrows, extensive labeling, all competing for attention. When we try to show every aspect of the process at the same time, we make it harder to focus on understanding each individual part. Instead of clarifying the concept, this approach adds an extra layer of complexity that we need to decode first.

Too many labels, too many colors, too many arrows and it is too hard to focus.

Movement That Misleads

Movement in these diagrams creates a fundamental misunderstanding of how cross-validation actually works. When we show arrows and flowing elements, we’re suggesting a sequential process that doesn’t exist in reality. Cross-validation splits don’t need to happen in any particular order — the order of splits doesn’t affect the results at all.

These diagrams also give the wrong impression that data physically moves during cross-validation. In reality, we’re simply selecting different rows from our original dataset each time. The data stays exactly where it is, and we just change which rows we use for testing in each split. When diagrams show data flowing between splits, they add unnecessary complexity to what should be a straightforward process.

While diagrams typically flow from top to bottom, it’s hard to follow the sequence of operations. The timing of model training and the calculation results remain unclear. When does the training happen? What results come from each calculation?

What We Need Instead

We need diagrams that:

  • Don’t just rely on colors to explain things
  • Show information in clear, separate chunks
  • Make it obvious that different test groups are independent
  • Don’t use unnecessary arrows and movement

Let’s fix this. Instead of trying to make our brains work differently, why don’t we create something that feels natural to look at?

Let’s try something different. First, this is how data looks like to most people — rows and columns of numbers with index.

This is the common dataset I used for my articles on classification algorithms.

Inspired by that structure, here’s a diagram that make more sense.

Simpler but clear depiction of cross-validation.

Here’s why this design makes more sense logically:

  1. True Data Structure: It matches how data actually works in cross-validation. In practice, we’re selecting different portions of our dataset — not moving data around. Each column shows exactly which splits we’re using for testing each time.
  2. Independent Splits: Each split explicitly shows it’s different data. Unlike moving boxes that might make you think “it’s the same test set moving around,” this shows that Split 2 is using completely different data from Split 1. This matches what’s actually happening in your code.
  3. Data Conservation: By keeping the column height the same throughout all folds, we’re showing an important rule of cross-validation: you always use your entire dataset. Some portions for testing, the rest for training. Every piece of data gets used, nothing is left out.
  4. Complete Coverage: Looking left to right, you can easily check an important cross-validation principle: every portion of your dataset will be used as test data exactly once.
  5. Three-Fold Simplicity: We specifically use 3-fold cross-validation here because:
    a. It clearly demonstrates the key concepts without overwhelming detail
    b. The pattern is easy to follow: three distinct folds, three test sets. Simple enough to mentally track which portions are being used for training vs testing in each fold
    c. Perfect for educational purposes — adding more folds (like 5 or 10) would make the visualization more cluttered without adding conceptual value
    (Note: While 5-fold or 10-fold cross-validation might be more common in practice, 3-fold serves perfectly to illustrate the core concepts of the technique.)

Adding Indices for Clarity

While the concept above is correct, thinking about actual row indices makes it even clearer:

An enhanced variation with subtle index, making it easier to see which part of the dataset each fold belong to. The dashed lines help in separating the indices.

Here are some reasons of improvements of this visual:

  • Instead of just “different portions,” we can see that Fold 1 tests on rows 1–4, Fold 2 on rows 5–7, and Fold 3 on rows 8–10
  • “Complete coverage” becomes more concrete: rows 1–10 each appear exactly once in test sets
  • Training sets are explicit: when testing on rows 1–4, we’re training on rows 5–10
  • Data independence is obvious: test sets use different row ranges (1–3, 4–6, 7–10)

This index-based view doesn’t change the concepts — it just makes them more concrete and easier to implement in code. Whether you think about it as portions or specific row numbers, the key principles remain the same: independent folds, complete coverage, and using all your data.

Adding Some Colors

If you feel the black-and-white version is too plain, this is also another acceptable options:

A variation of the previous diagram, adding color to each fold’s number.

While using colors in this version might seem problematic given the issues with color blindness and memory load mentioned before, it can still work as a helpful teaching tool alongside the simpler version.

The main reason is that it doesn’t only use colors to show the information — the row numbers (1–10) and fold numbers tell you everything you need to know, with colors just being a nice extra touch.

This means that even if someone can’t see the colors properly or prints it in black and white, they can still understand everything through the numbers. And while having to remember what each color means can make things harder to learn, in this case you don’t have to remember the colors — they’re just there as an extra help for people who find them useful, but you can totally understand the diagram without them.

Just like the previous version, the row numbers also help by showing exactly how the data is being split up, making it easier to understand how cross-validation works in practice whether you pay attention to the colors or not.

The visualization remains fully functional and understandable even if you ignore the colors completely.

Try the challenge above. For limited number of colors, it aids in tracking the changes of the position faster.

Let’s look at why our new designs makes sense not just from a UX view, but also from a data science perspective.

Matching Mental Models: Think about how you explain cross-validation to someone. You probably say “we take these rows for testing, then these rows, then these rows.” Our visualization now matches exactly how we think and talk about the process. We’re not just making it pretty, we’re making it match reality.

Data Structure Clarity: By showing data as columns with indices, we’re revealing the actual structure of our dataset. Each row has a number, each number appears in exactly one test set. This isn’t just good design, it’s accurate to how our data is organized in code.

Even with shuffling, which is the default way to do cross validation, we can just change the index so people understand that it is being shuffled.

Focus on What Matters: Our old way of showing cross-validation had us thinking about moving parts. But that’s not what matters in cross-validation. What matters is:

  • Which rows are we testing on?
  • Are we using all our data?
  • Is each row used for testing exactly once?

Our new design answers these questions at a glance.

Index-Based Understanding: Instead of abstract colored boxes, we’re showing actual row indices. When you write cross-validation code, you’re working with these indices. Now the visualization matches your code — Fold 1 uses rows 1–4, Fold 2 uses 5–7, and so on.

Using similar diagram, we can also show how leave-on-out cross validation works. Only one data point is used in the test set! The split numbering and the chosen index for the test set are also nicely matched.

Clear Data Flow: The layout shows data flowing from left to right: here’s your dataset, here’s how it’s split, here’s what each split looks like. It matches the logical steps of cross-validation and it’s also easier to look at.

Clarifying the purpose of the arrows to denote the train & test process can make it clearer on how many models and what are the outputs of the cross-validation. You may note that there’s no arrow connecting elements between splits.

Here’s what we’ve learned about the whole redrawing of the cross-validation diagram:

Match Your Code, Not Conventions: We usually stick to traditional ways of showing things just because that’s how everyone does it. But cross-validation is really about selecting different rows of data for testing, so why not show exactly that? When your visualization matches your code, understanding follows naturally.

Data Structure Matters: By showing indices and actual data splits, we’re revealing how cross-validation really works while also make a clearer picture. Each row has its place, each split has its purpose, and you can trace exactly what’s happening in each step.

Simplicity Has It Purpose: It turns out that showing less can actually explain more. By focusing on the essential parts — which rows are being used for testing, and when — we’re not just simplifying the visualization but we’re also highlighting what actually matters in cross-validation.

Looking ahead, this thinking can apply to many data science concepts. Before making another visualization, ask yourself:

  • Does this show what’s actually happening in the code?
  • Can someone trace the data flow?
  • Are we showing structure, or just following tradition?

Good visualization isn’t about following rules — it’s about showing truth. And sometimes, the clearest truth is also the simplest.

Source link

#CrossValidation #Visualizations #Wrong #Fix #Samy #Baladram #Nov