Download this Paper Open PDF in Browser

Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data

50 Pages Posted: 18 Jun 2012 Last revised: 11 Jul 2013

Eric M. Schwartz

University of Michigan, Stephen M. Ross School of Business

Eric Bradlow

University of Pennsylvania - Marketing Department

Peter Fader

University of Pennsylvania - Marketing Department

Date Written: July 10, 2013

Abstract

When managers and researchers encounter a dataset, they typically ask two key questions: (1) which model (from a candidate set) should I use? and (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions, and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model'' before having to fit any of them for longitudinal incidence data. We characterize datasets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the"legwork'' of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for dataset characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth'' migration between latent states) are more important to accommodate than others (e.g., the inclusion of an "off'' state with no activity). We also demonstrate the method's broad potential by providing a general "recipe'' for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).

Keywords: data science, business intelligence, model selection, machine learning, classification tree, posterior predictive model checking, hidden Markov models, hierarchical Bayesian methods, random forests, forecasting

JEL Classification: C11, C15, C22, C23, C51, C52, C53, M31

Suggested Citation

Schwartz, Eric M. and Bradlow, Eric and Fader, Peter, Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data (July 10, 2013). Available at SSRN: https://ssrn.com/abstract=2085767 or http://dx.doi.org/10.2139/ssrn.2085767

Eric M. Schwartz (Contact Author)

University of Michigan, Stephen M. Ross School of Business ( email )

701 Tappan Street
Ann Arbor, MI MI 48109
United States

Eric Bradlow

University of Pennsylvania - Marketing Department ( email )

700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States
215-898-8255 (Phone)

Peter Fader

University of Pennsylvania - Marketing Department ( email )

700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States

Paper statistics

Downloads
1,796
Rank
6,790
Abstract Views
4,907