The Recoverability of Segmentation Structure from Store-Level Aggregate Data
Journal of Marketing Research, Forthcoming
37 Pages Posted: 29 Mar 2004
We focus on the problem of estimating a latent class choice model with consumer response segments when all one has are store-level aggregate data. Most of the proposed methodologies in the marketing literature require household panel data. This requirement is ordinarily hard to meet because of a) the difficulty of recruiting panelists who are representative of the consumer population, and b) insufficient coverage of specific geographic markets by household panel data. There is a growing stream of work in marketing and in empirical industrial organization which estimates segmentation structure using aggregate data. Such methodologies are attractive because they only require data readily available to retailers, namely, transactions recorded at checkout without tracking customers. Among marketing scientists, there is a wide spectrum of beliefs on the recoverability of segmentation structure from aggregate data. The beliefs range from incredulity, based on the idea that all information on differences amongst panelists is destroyed upon aggregation, to unquestioning confidence, where segmentation estimates based on aggregate data are taken at face value and used to construct managerial implications. This paper is a careful attempt to understand the extent to which disaggregate structure in the form of a latent class model can be recovered from aggregate data. We show that under specific assumptions, and when the household level model is correctly specified, most of a latent class segmentation structure is identifiable even if only store-level aggregate data are available. Therefore, the store-data-based estimates for the latent class model are consistent. In other words, the Mean Absolute Deviation (MAD) of the estimates goes to zero with infinite sample size. However, to the manager, of more practical interest are the finite-sample properties of store-data-based estimates. To assess how well latent class structure can be estimated from store-datasets of sample sizes comparable to what one might see in real life, we simulate over sixty thousand store-datasets and compute the consequent model estimates and their estimation errors. Our simulations answer the question: How quickly does the MAD of store data estimates diminish with sample size? The results show that the MAD of the latent class estimates diminishes substantially more slowly with store data than with household data. Moreover, this rate is so slow that getting estimates with reasonably small MADs will often require unreasonably large sample sizes. Our simulations offer guidance on conditions that favor getting more accurate estimates from store-level data.
Keywords: Aggregation, latent class model, logit model, choice model, segmentation, panel data
JEL Classification: C33
Suggested Citation: Suggested Citation