• About
  • Advertise
  • Privacy & Policy
  • Contact
Sunday, January 11, 2026
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

Machine Learning Meets Panel Data: What Practitioners Need to Know

AiNEWS2025 by AiNEWS2025
2025-10-18
in Machine Learning
0
Machine Learning Meets Panel Data: What Practitioners Need to Know
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Authors: Augusto Cerqua, Marco Letta, Gabriele Pinto

learning (ML) has gained a central role in economics, the social sciences, and business decision-making. In the public sector, ML is increasingly used for so-called prediction policy problems: settings where policymakers aim to identify units most at risk of a negative outcome and intervene proactively; for instance, targeting public subsidies, predicting local recessions, or anticipating migration patterns. In the private sector, similar predictive tasks arise when firms seek to forecast customer churn, or optimize credit risk assessment. In both domains, better predictions translate into more efficient allocation of resources and more effective interventions.

To achieve these goals, ML algorithms are increasingly applied to panel data, characterized by repeated observations of the same units over multiple time periods. However, ML models were not originally designed for use with panel data, which feature distinctive cross-sectional and longitudinal dimensions. When ML is applied to panel data, there is a high risk of a subtle but serious problem: data leakage. This occurs when information unavailable at prediction time accidentally enters the model training process, inflating predictive performance. In our paper “On the Mis(Use) of Machine Learning With Panel Data” (Cerqua, Letta, and Pinto, 2025), recently published in the Oxford Bulletin of Economics and Statistics, we provide the first systematic assessment of data leakage in ML with panel data, propose clear guidelines for practitioners, and illustrate the consequences through an empirical application with publicly available U.S. county data.

The Leakage Problem

Panel data combine two structures: a temporal dimension (units observed across time) and a cross-sectional dimension (multiple units, such as regions or firms). Standard ML practice, splitting the sample randomly into training and testing sets, implicitly assumes independent and identically distributed (i.i.d.) data. This assumption is violated when default ML procedures (such as a random split) are applied to panel data, creating two main types of leakage:

  •  Temporal leakage: future information leaks into the model during the training phase, making forecasts look unrealistically accurate. Furthermore, past information can end up in the testing set, making ‘forecasts’ retrospective.
  • Cross-sectional leakage: the same or very similar units appear in both training and testing sets, meaning the model has already “seen” most of the cross-sectional dimension of the data.

Figure 1 shows how different splitting strategies affect the risk of leakage. A random split at the unit–time level (Panel A) is the most problematic, as it introduces both temporal and cross-sectional leakage. Alternatives such as splitting by units (Panel B), by groups (Panel C), or by time (Panel D), mitigate one type of leakage but not the other. As a result, no strategy completely eliminates the problem: the appropriate choice depends on the task at hand (see below), since in some cases one form of leakage may not be a real concern.

Figure 1  |  Training and testing sets under different splitting rules

Notes: In this example, the panel data are structured with years as the time variable, counties as the unit variable, and states as the group variable. Image made by the authors.

Two Types of Prediction Policy Problems

A key insight of the study is that researchers must clearly define their prediction goal ex-ante. We distinguish two broad classes of prediction policy problems:

1. Cross-sectional prediction: The task is to map outcomes across units in the same period. For example, imputing missing data on GDP per capita across regions when only some regions have reliable measurements. The best split here is at the unit level: different units are assigned to training and testing sets, while all time periods are kept. This eliminates cross-sectional leakage, although temporal leakage remains. But since forecasting is not the goal, this is not a real issue.

2. Sequential forecasting: The goal is to predict future outcomes based on historical data—for example, predicting county-level income declines one year ahead to trigger early interventions. Here, the correct split is by time: earlier periods for training, later periods for testing. This avoids temporal leakage but not cross-sectional leakage, which is not a real concern since the same units are being forecasted across time.

The wrong approach in both cases is the random split by unit-time (Panel A of Figure 1), which contaminates results with both types of leakage and produces misleadingly high performance metrics.

Practical Guidelines

To help practitioners, we summarize a set of do’s and don’ts for applying ML to panel data:

  • Choose the sample split based on the research question: unit-based for cross-sectional problems, time-based for forecasting.
  • Temporal leakage can occur not only through observations, but also through predictors. For forecasting, only use lagged or time-invariant predictors. Using contemporaneous variables (e.g., using unemployment in 2014 to predict income in 2014) is conceptually wrong and creates temporal data leakage.
  • Adapt cross-validation to panel data. Random k-fold CV found in most ready-to-use software packages is inappropriate, as it mixes future and past information. Instead, use rolling or expanding windows for forecasting, or stratified CV by units/groups for cross-sectional prediction.
  • Ensure that out-of-sample performance is tested on truly unseen data, not on data already encountered during training.

Empirical Application

To illustrate these issues, we analyze a balanced panel of 3,058 U.S. counties from 2000 to 2019, focusing exclusively on sequential forecasting. We consider two tasks: a regression problem—forecasting per capita income—and a classification problem—forecasting whether income will decline in the subsequent year.

We run hundreds of models, varying split strategies, use of contemporaneous predictors, inclusion of lagged outcomes, and algorithms (Random Forest, XGBoost, Logit, and OLS). This comprehensive design allows us to quantify how leakage inflates performance. Figure 2 below reports our main findings.

Panel A of Figure 2 shows forecasting performance for classification tasks. Random splits yield very high accuracy, but this is illusory: the model has already seen similar data during training.

Panel B shows forecasting performance for regression tasks. Once again, random splits make models look far better than they really are, while correct time-based splits show much lower, yet realistic, accuracy.

Figure 2  |  Temporal leakage in the forecasting problem

      Panel A – Classification task

      Panel B – Regression task

In the paper, we also show that the overestimation of model accuracy becomes significantly more pronounced during years marked by distribution shifts and structural breaks, such as the Great Recession, making the results particularly misleading for policy purposes.

Why It Matters

Data leakage is more than a technical pitfall; it has real-world consequences. In policy applications, a model that seems highly accurate during validation may collapse once deployed, leading to misallocated resources, missed crises, or misguided targeting. In business settings, the same issue can translate into poor investment decisions, inefficient customer targeting, or false confidence in risk assessments. The danger is especially acute when machine learning models are intended to serve as early-warning systems, where misplaced trust in inflated performance can result in costly failures.

By contrast, properly designed models, even if less accurate on paper, provide honest and reliable predictions that can meaningfully inform decision-making.

Takeaway

ML has the potential to transform decision-making in both policy and business, but only if applied correctly. Panel data offer rich opportunities, yet are especially vulnerable to data leakage. To generate reliable insights, practitioners should align their ML workflow with the prediction objective, account for both temporal and cross-sectional structures, and use validation strategies that prevent overoptimistic assessments and an illusion of high accuracy. When these principles are followed, models avoid the trap of inflated performance and instead provide guidance that genuinely helps policymakers allocate resources and businesses make sound strategic choices. Given the rapid adoption of ML with panel data in both public and private domains, addressing these pitfalls is now a pressing priority for applied research.

References

A. Cerqua, M. Letta, and G. Pinto, “On the (Mis)Use of Machine Learning With Panel Data”, Oxford Bulletin of Economics and Statistics (2025): 1–13, https://doi.org/10.1111/obes.70019.

Source link

#Machine #Learning #Meets #Panel #Data #Practitioners

Tags: algorithmsData LeakageEconometricsEditors PickPanel Data
Previous Post

With deadline looming, 4 of 9 universities reject Trump’s “compact” to remake higher ed

Next Post

The Download: Creating the perfect baby, and carbon removal’s lofty promises

AiNEWS2025

AiNEWS2025

Next Post
The Download: Creating the perfect baby, and carbon removal’s lofty promises

The Download: Creating the perfect baby, and carbon removal's lofty promises

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

2026-01-10
Elon Musk says he’s going to open-source the new X algorithm next week

Elon Musk says he’s going to open-source the new X algorithm next week

2026-01-10

Recent News

Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

Conservative lawmakers want porn taxes. Critics say they’re unconstitutional.

2026-01-10
Elon Musk says he’s going to open-source the new X algorithm next week

Elon Musk says he’s going to open-source the new X algorithm next week

2026-01-10
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Can One AI Platform Replace Your Creative Tool Stack?

Can One AI Platform Replace Your Creative Tool Stack?

2026-01-10
Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

2026-01-10
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2026 JNews - Premium WordPress news & magazine theme by Jegtheme.