# Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

42 Pages Posted: 24 Sep 2020 Last revised: 5 Oct 2020

## Andrii Babii

University of North Carolina at Chapel Hill

## Ryan T. Ball

The Stephen M. Ross School of Business at the University of Michigan

## Eric Ghysels

University of North Carolina Kenan-Flagler Business School; University of North Carolina (UNC) at Chapel Hill - Department of Economics

## Jonas Striaukas

UC Louvain and F.R.S.-FNRS; Louvain Finance

Date Written: August 6, 2020

### Abstract

This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $\tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.

Keywords: corporate earnings, nowcasting, high-dimensional panels, mixed frequency data, text data, sparse-group LASSO, heavy-tailed t-mixing processes, Fuk-Nagaev inequality

JEL Classification: C22, C51, C52, C53, C55, C58, G17

