Training Trees on Tails with Applications to Portfolio Choice
34 Pages Posted: 20 Jun 2019 Last revised: 24 Feb 2020
Date Written: June 12, 2019
Abstract
In this article, we investigate the impact of truncating training data when fitting regression trees. We argue that training times can be curtailed by reducing the training sample without any loss in out-of-sample accuracy as long as the prediction model has been trained on the tails of the dependent variable, that is, when ‘average’ observations have been discarded from the training sample. Filtering instances has an impact on the features that are selected to yield the splits and can help reduce overfitting by favoring predictors with monotonous impacts on the dependent variable. We test this technique in an out-of-sample exercise of portfolio selection which shows its benefits. The implications of our results are decisive for time-consuming tasks such as hyperparameter tuning and validation.
Keywords: Decision trees; Filtering training set; Factor investing; Portfolio choice; Feature selection
JEL Classification: C40; G11; G12
Suggested Citation: Suggested Citation