Abstract

http://ssrn.com/abstract=1708562
 
 

References (100)



 


 



Customer-Base Analysis on a 'Data Diet': Model Inference Using Repeated Cross-Sectional Summary (RCSS) Data


Kinshuk Jerath


Columbia University - Columbia Business School

Peter Fader


University of Pennsylvania - Marketing Department

Bruce Hardie


London Business School

December 21, 2013


Abstract:     
We address a critical question that many firms are facing in this era of "big data'': Can customer data be stored and analyzed in an easy-to-manage and scalable manner without significantly compromising the inferences that can be made about the customers' transaction activity? We address this question in the context of customer-base analysis. A number of researchers have developed customer-base analysis models that perform very well given detailed individual-level data. We explore the possibility of estimating these models using aggregated data summaries alone, namely repeated cross-sectional summaries (RCSS) of the transaction data (e.g., four quarterly histograms). Such summaries are easy to create, visualize, and distribute, irrespective of the size of the customer base. An added advantage of RCSS data is that individual customers cannot be identified, which makes it desirable from a privacy viewpoint as well. We focus on the widely used Pareto/NBD model and carry out a comprehensive simulation study covering a vast spectrum of market scenarios. Our results consistently and convincingly establish that model performance associated with the use of three or four cross-sections of RCSS data (as judged by model fit, parameter recovery, and forward-looking metrics of customer value) can closely match the model performance associated with the use of individual-level data. We confirm the results of the simulations on a real dataset of purchases from an online fashion retailer. The thesis of our approach is that existing statistical models continue to have value in a "big data'' world, but to harness this value one may want to approach estimation of these models in a different manner.

Number of Pages in PDF File: 38

Keywords: Customer-base analysis, probability models, Pareto/NBD, scalability, data aggregation, information loss

JEL Classification: C15, C23, C24, C51, C53, C81, M31

working papers series





Download This Paper

Date posted: November 14, 2010 ; Last revised: December 22, 2013

Suggested Citation

Jerath, Kinshuk and Fader, Peter and Hardie, Bruce, Customer-Base Analysis on a 'Data Diet': Model Inference Using Repeated Cross-Sectional Summary (RCSS) Data (December 21, 2013). Available at SSRN: http://ssrn.com/abstract=1708562 or http://dx.doi.org/10.2139/ssrn.1708562

Contact Information

Kinshuk Jerath
Columbia University - Columbia Business School ( email )
3022 Broadway
New York, NY 10027
United States

Peter Fader (Contact Author)
University of Pennsylvania - Marketing Department ( email )
700 Jon M. Huntsman Hall
3730 Walnut Street
Philadelphia, PA 19104-6340
United States

Bruce Hardie
London Business School ( email )
Regent's Park
London, NW1 4SA
United Kingdom
Feedback to SSRN


Paper statistics
Abstract Views: 4,308
Downloads: 636
Download Rank: 22,762
References:  100

© 2014 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright   Contact Us
This page was processed by apollo6 in 0.312 seconds