Recovering Missing Firm Characteristics with Attention-Based Machine Learning

65 Pages Posted: 10 Jan 2022 Last revised: 23 Jan 2023

See all articles by Heiner Beckmeyer

Heiner Beckmeyer

University of Münster

Timo Wiedemann

University of Münster - Finance Center Münster

Date Written: January 23, 2023


Firm characteristics are used in an abundance of empirical research in accounting, economics, finance, and many related fields. These characteristics are, however, frequently unobservable to the researcher, with intricate patterns as to why and when they are missing. In the past, researchers dropped firm-month observations with missing information. This approach quickly becomes infeasible as the number of characteristics grows, which is required to simultaneously assess their informational content. A second approach that has emerged in response is to impute the cross-sectional mean, which discards important variation over time and in the cross-section. Our study is devoted to the recovery of these missing entries, drawing on the informational content of other – observed – characteristics, their past observations, and information from the cross-section of other firms. We adapt state-of-the-art advances from natural language processing to the case of financial data and train a flexible large-scale machine learning model in a self-supervised environment. To train the model, we consider several masking types which account for empirically observed patterns of missingness. Using the uncovered latent structure governing firm characteristics, we show that our model beats competing methods, as well as several approaches tailored to the imputation of financial data. Based on the completed dataset, we show that average returns to many characteristic-sorted long-short portfolios are likely lower than previously thought. In general, the return distribution of firms with missing characteristics differs significantly from those firms with all information available, highlighting the importance of adequately imputing missing values.

Keywords: Machine Learning, Missing Data, Big Data, Risk Factors

JEL Classification: G10, G12, G14, C1, C55

Suggested Citation

Beckmeyer, Heiner and Wiedemann, Timo, Recovering Missing Firm Characteristics with Attention-Based Machine Learning (January 23, 2023). Proceedings of the EUROFIDAI-ESSEC Paris December Finance Meeting 2022, Available at SSRN: or

Heiner Beckmeyer (Contact Author)

University of Münster ( email )

Schlossplatz 2
Muenster, D-48143


Timo Wiedemann

University of Münster - Finance Center Münster ( email )

Universitätsstraße 14-16
Münster, 48143

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics