Textual Information and IPO Underpricing: A Machine Learning Approach
The Journal of Financial Data Science, volume 5, issue 2, 2023[10.3905/jfds.2023.1.121]
Posted: 8 Oct 2024
Date Written: October 10, 2022
Abstract
This study examines the predictive power of textual information from S-1 filings in explaining IPO underpricing. The author's approach differs from previous research, as they utilize several machine learning algorithms to predict whether an IPO will be underpriced or not, as well as the magnitude of the underpricing. Using a sample of 2,481 U.S. IPOs, they find that textual information can effectively complement financial variables in terms of prediction accuracy, since models that use both sources of data produce more accurate estimates. In particular, the model with the best performance using only financial variables achieves 67.5% accuracy while the best model with both textual and financial data appears a substantial improvement (6.1%). Also, the usage of sophisticated machine learning models drives an increase in the predictive accuracy compared to the traditional logistic regression model (2.5%). The authors attribute the findings to the fact that textual information can reduce the ex-ante valuation uncertainty of IPO firms. Finally, they create a portfolio of IPOs based on the out-of-sample machine learning predictions, which remarkably achieves 27.90% average returns. Their portfolio achieves extraordinary abnormal returns in various time dimensions (both in the short and long run), achieving up to 30% better yield than the benchmark.
Keywords: Initial public offerings, First-day returns, Natural language processing, Machine learning
JEL Classification: C63, G12, G14, G40
Suggested Citation: Suggested Citation