Abstract

http://ssrn.com/abstract=2085767
 
 

References (40)



 
 

Citations (1)



 


 



Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data


Eric M. Schwartz


University of Michigan, Stephen M. Ross School of Business

Eric Bradlow


University of Pennsylvania - Marketing Department

Peter Fader


University of Pennsylvania - Marketing Department

July 10, 2013


Abstract:     
When managers and researchers encounter a dataset, they typically ask two key questions: (1) which model (from a candidate set) should I use? and (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions, and provides a rule, i.e., a decision tree, for data analysts to portend the "winning model'' before having to fit any of them for longitudinal incidence data. We characterize datasets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. By doing the"legwork'' of obtaining this decision tree for model selection, we provide a time-saving tool to analysts. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing incidence for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of a hidden Markov model (HMM) and its constrained variants. We observe a strong ability for dataset characteristics to guide the choice of the most appropriate model, and we observe that some model features (e.g., the "back-and-forth'' migration between latent states) are more important to accommodate than others (e.g., the inclusion of an "off'' state with no activity). We also demonstrate the method's broad potential by providing a general "recipe'' for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing incidence data and the HMM framework).

Number of Pages in PDF File: 50

Keywords: data science, business intelligence, model selection, machine learning, classification tree, posterior predictive model checking, hidden Markov models, hierarchical Bayesian methods, random forests, forecasting

JEL Classification: C11, C15, C22, C23, C51, C52, C53, M31

working papers series


Download This Paper

Date posted: June 18, 2012 ; Last revised: July 11, 2013

Suggested Citation

Schwartz, Eric M. and Bradlow, Eric and Fader, Peter, Model Selection Using Database Characteristics: Developing a Classification Tree for Longitudinal Incidence Data (July 10, 2013). Available at SSRN: http://ssrn.com/abstract=2085767 or http://dx.doi.org/10.2139/ssrn.2085767

Contact Information

Eric M. Schwartz (Contact Author)
University of Michigan, Stephen M. Ross School of Business ( email )
701 Tappan Street
Ann Arbor, MI MI 48109
United States
Eric Bradlow
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States
215-898-8255 (Phone)

Peter Fader
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States

Feedback to SSRN


Paper statistics
Abstract Views: 3,425
Downloads: 1,040
Download Rank: 10,559
References:  40
Citations:  1

© 2014 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollo7 in 0.282 seconds