Re-Em Trees: A New Data Mining Approach for Longitudinal Data
35 Pages Posted: 15 Jul 2009
Date Written: June 2009
Longitudinal data refer to the situation where repeated observations are available for each sampled individual. Methodologies that take this structure into account allow for systematic differences between individuals that are not related to covariates. A standard methodology in the statistics literature for this type of data is the random effects model, where these differences between individuals are represented by so-called “effects” that are estimated from the data. This paper presents a methodology that combines the flexibility of tree-based estimation methods with the structure of random effects models for longitudinal data. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.
Suggested Citation: Suggested Citation