Customer-Base Analysis on a 'Data Diet': Model Inference Using Repeated Cross-Sectional Summary (RCSS) Data
Carnegie Mellon University - David A. Tepper School of Business
University of Pennsylvania - Marketing Department
London Business School
January 7, 2013
We address a critical question that many firms are facing today: Can customer data be stored and analyzed in an easy-to-manage and scalable manner without significantly compromising the inferences that can be made about the customers' transaction activity? We address this question in the context of customer-base analysis. A number of researchers have developed customer-base analysis models that perform very well given detailed individual-customer-level data. We explore the possibility of estimating these models using data summaries. We use repeated cross-sectional summaries (RCSS) of the transaction data (e.g., four quarterly histograms). Such summaries are easy to create, visualize and distribute, irrespective of the size of the customer base. An added advantage of RCSS data is that individual customers cannot be identified, which makes it desirable from a privacy viewpoint as well. We focus on the widely used Pareto/NBD model and carry out a comprehensive simulation study covering a vast spectrum of market scenarios. Our results consistently and convincingly establish that model performance associated with the use of three or four cross-sections of RCSS data (in terms of the model fit, parameter values and forward-looking metrics of customer value) can closely match the model performance associated with the use of individual-level data.
Number of Pages in PDF File: 31
Keywords: Customer-base analysis, probability models, Pareto/NBD, scalability, data aggregation, information loss
JEL Classification: C15, C23, C24, C51, C53, C81, M31working papers series
Date posted: November 14, 2010 ; Last revised: January 8, 2013
© 2013 Social Science Electronic Publishing, Inc. All Rights Reserved.
This page was processed by apollo1 in 0.453 seconds