Unbiased Regression Trees for Longitudinal and Clustered Data
33 Pages Posted: 24 Feb 2014 Last revised: 1 Dec 2014
Date Written: November 30, 2014
Abstract
This paper presents a new version of the RE-EM regression tree method for longitudinal and clustered data. The RE-EM tree is a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. The RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. The previously-suggested methodology used the CART tree algorithm for tree building, and therefore that RE-EM regression tree method inherits the tendency of CART to split on variables with more possible split points at the expense of those with fewer split points. A revised version of the RE-EM regression tree corrects for this bias by using the conditional inference tree as the underlying tree algorithm instead of CART. Simulation studies show that the new version is indeed unbiased, and has several improvements over the original RE-EM regression tree in terms of prediction accuracy and the ability to recover the correct true structure.
Keywords: Clustered data, Longitudinal data, Mixed effects, Regression trees
Suggested Citation: Suggested Citation