Replacing Cross-Validation with Interrogation: A Universal Test for Underfitting and Overfitting
16 Pages Posted: 7 May 2025 Last revised: 1 May 2025
Date Written: April 23, 2025
Abstract
Gauging the reliability of a prediction routine for use outside its training data is a fundamental challenge in machine learning. Model training typically relies on cross-validation to avoid overfitting, whereby alternate calibrations of a model are trained on subsamples of the available data and tested on the corresponding validation samples. However, cross-validation has inherent limitations: it cannot directly evaluate the model trained on all available data, sample slicing limits the statistical power of subsample training and testing, and it is computationally expensive. We propose an alternative approach based on interrogating the primary model trained on all available data. In our simulations, the interrogation-based method successfully identified near-optimal model calibrations without using any validation samples. Our model-agnostic method works by explicitly decomposing prediction logic into linear, nonlinear, pairwise, and high-order interaction components, and then testing for the statistical identifiability of these separate components in the presence of noise. Poorly calibrated models reveal statistical problems analogous to harmful collinearity in linear regression. Interrogation can be applied in a wide variety of contexts to evaluate model reliability.
Keywords: Model validation, Overfitting, Underfitting, Cross-validation, Neural network, Machine learning, Model interpretability, Agnostic model
JEL Classification: C51, C52
Suggested Citation: Suggested Citation