On the (Mis)Use of Machine Learning with Panel Data

37 Pages Posted: 19 Nov 2024 Last revised: 5 May 2025

See all articles by Augusto Cerqua

Augusto Cerqua

Sapienza University Of Rome, Department of Social Sciences and Economics

Marco Letta

Sapienza University of Rome - Department of Social Sciences and Economics

Gabriele Pinto

Sapienza University Of Rome, Department of Social Sciences and Economics

Date Written: November 08, 2024

Abstract

We provide the first systematic assessment of data leakage issues in the use of machine learning on panel data. Our organizing framework clarifies why neglecting the cross-sectional and longitudinal structure of these data leads to hard-to-detect data leakage, inflated out-of-sample performance, and an inadvertent overestimation of the real-world usefulness and applicability of machine learning models. We then offer empirical guidelines for practitioners to ensure the correct implementation of supervised machine learning in panel data environments. An empirical application, using data from over 3,000 U.S. counties spanning 2000-2019 and focused on income prediction, illustrates the practical relevance of these points across nearly 500 models for both classification and regression tasks.

Keywords: prediction policy problems, panel data, data leakage, machine learning

JEL Classification: C33, C53

Suggested Citation

Cerqua, Augusto and Letta, Marco and Pinto, Gabriele, On the (Mis)Use of Machine Learning with Panel Data (November 08, 2024). Available at SSRN: https://ssrn.com/abstract=5014594 or http://dx.doi.org/10.2139/ssrn.5014594

Augusto Cerqua (Contact Author)

Sapienza University Of Rome, Department of Social Sciences and Economics ( email )

P.le Aldo Moro 5
Rome, 00185
Italy

Marco Letta

Sapienza University of Rome - Department of Social Sciences and Economics ( email )

Italy

Gabriele Pinto

Sapienza University Of Rome, Department of Social Sciences and Economics ( email )

Piazzale Aldo Moro 5
Roma, Rome 00185
Italy

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
141
Abstract Views
710
Rank
444,231
PlumX Metrics