Abstract

 
 

References (44)



 
 

Citations (1)



 


 



Model Selection Using Database Characteristics: Classification Methods and an Application to the 'HMM and Its Children'


Eric M. Schwartz


University of Pennsylvania - Marketing Department

Eric Bradlow


University of Pennsylvania - Marketing Department

Peter Fader


University of Pennsylvania - Marketing Department

June 17, 2012


Abstract:     
When managers and researchers encounter a dataset, they typically ask two key questions: (1) which model (from a candidate set) should be used? and (2) if I use a particular model, when is it going to likely work well for my business goal? This research addresses those two questions, and provides a rule for data analysts to portend the "winning model" before having to fit any of them. We characterize datasets based on managerially relevant (and easy-to-compute) summary statistics, and we use classification techniques from machine learning to provide a decision tree that recommends when to use which model. We illustrate this method for a common marketing problem (i.e., forecasting repeat purchasing for a cohort of new customers) and demonstrate the method's ability to discriminate among an integrated family of probability models that we call the "HMM and its children." We observe a strong ability for dataset characteristics to guide the choice of the most appropriate model, and observe that some model features (e.g., the "back-and-forth" migration between latent states) are more important to accommodate than others (e.g., the inclusion of an "off" state with no activity). We also demonstrate the method's broad generality by providing directions for researchers to replicate this kind of model classification task in other managerial contexts (outside of repeat purchasing and the HMM framework).

Number of Pages in PDF File: 55

Keywords: data science, business intelligence, model selection, machine learning, classification tree, posterior predictive model checking, hidden Markov models, hierarchical Bayesian methods, random forests, forecasting

JEL Classification: C11, C15, C22, C23, C51, C52, C53, M31

working papers series


Download This Paper

Date posted: June 18, 2012  

Suggested Citation

Schwartz, Eric M., Bradlow, Eric and Fader, Peter, Model Selection Using Database Characteristics: Classification Methods and an Application to the 'HMM and Its Children' (June 17, 2012). Available at SSRN: http://ssrn.com/abstract=2085767 or http://dx.doi.org/10.2139/ssrn.2085767

Contact Information

Eric M. Schwartz (Contact Author)
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States
HOME PAGE: http://www.ericmichaelschwartz.com
Eric Bradlow
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States
215-898-8255 (Phone)

Peter Fader
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States

Feedback to SSRN (Beta)


Paper statistics
Abstract Views: 2,214
Downloads: 714
Download Rank: 15,552
References:  44
Citations:  1

© 2013 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright
This page was processed by apollo4 in 1.078 seconds