Ordered Correlation Forest

31 Pages Posted: 6 May 2024

See all articles by Riccardo Di Francesco

Riccardo Di Francesco

University of Rome Tor Vergata - Department of Economics and Finance

Date Written: May 6, 2024


Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as self-evaluations of subjective well-being and self-assessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This paper introduces a novel estimator, the ordered correlation forest, that can naturally handle non-linearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an “honesty” condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates’ marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forest-based estimators and demonstrates its ability to construct valid confidence intervals for the covariates’ marginal effects.

Keywords: Ordered non-numeric outcomes, choice probabilities, machine learning

JEL Classification: C14, C25, C55

Suggested Citation

Di Francesco, Riccardo, Ordered Correlation Forest (May 6, 2024). CEIS Working Paper No. 577, Available at SSRN: https://ssrn.com/abstract=4818136 or http://dx.doi.org/10.2139/ssrn.4818136

Riccardo Di Francesco (Contact Author)

University of Rome Tor Vergata - Department of Economics and Finance ( email )

Via columbia 2
Rome, Rome 00123

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics