How to Deal with Small Data Sets in Machine Learning: An Analysis on the CAT Bond Market
28 Pages Posted: 26 Feb 2020
Date Written: January 30, 2020
This study compares state-of-the-art regression-based models to machine learning methods in terms of forecasting performance in asset pricing on a small data set. The performance comparison is conducted on the market for CAT bonds, where we use a large sample of CAT bond issues to forecast risk premia. First, we evaluate the performance of regression models based on the literature. We then test whether the accuracy of those models can be improved through different variable selection algorithms or penalization methods. Afterwards, we use the machine learning methods random forest and neural networks to forecast CAT bond premia. We obtain three main results. First, the application of selection and penalization methods to linear regression models yields only minor differences in forecasting performance. Second, random forest outperforms regression models in terms of forecasting performance. Third, machine learning methods perform quite well on a relatively small data set.
Keywords: CAT bonds, meachine learning, regression, risk premium
JEL Classification: C45, C58, G12, G17, G22
Suggested Citation: Suggested Citation