Empirical Asset Pricing with Missing Data
41 Pages Posted: 10 Jan 2022 Last revised: 3 May 2024
Date Written: January 7, 2022
Abstract
What is the impact of missing information in empirical asset pricing? To seek an answer to this question, we first propose an overarching machine learning method that accurately imputes missing firm characteristics using the characteristic's own past, information about other characteristics and their temporal evolution. We then document the impact of adequately accounting for missing information on questions in asset pricing in three ways: first, we show that factor premia obtained from simple portfolio sorts are likely lower than previously thought; second, acknowledging that the information density differs between firms allows for a more accurate description of the risk-return trade-off across stocks; third, we confirm that simple imputation techniques work as well as sophisticated methods when used for machine learned return predictability. We argue that the complexity of the methods results in their ability to handle amounts of missing information that far exceed what we see empirically.
Keywords: Missing Data, Machine Learning, IPCA, Return Prediction, Big Data
JEL Classification: G10, G12, G14, C14, C55
Suggested Citation: Suggested Citation