Bayesian Consumer Profiling
97 Pages Posted: 2 Mar 2016 Last revised: 12 May 2020
Date Written: May 11, 2020
Firms use aggregate data from data brokers (e.g., Acxiom, Experian) and external data sources (e.g., Census) to infer the likely characteristics of consumers and thus better predict consumers’ profiles and needs, unobtrusively. We demonstrate that the simple count method most commonly used in this effort relies on an assumption of conditional independence that fails to hold in many settings of managerial interest. We develop a Bayesian profiling method that leverages a different independence assumption and use simulations to show that in managerially-relevant settings, the Bayesian method will outperform the simple count method, often by an order of magnitude. We then compare both methods in three case studies. The first example estimates customers’ age on the basis of their first names; prediction errors decrease substantially. In the second example, the approach identifies 99.9% of people’s political affiliations based on their ZIP codes (vs. 30.3% with the simple count method). In the third case study, we infer the income, occupation, and education of online visitors of a marketing analytic software company, based exclusively on visitors’ IP addresses. We also show how the Bayesian profiling method intersects with the Little and Rubin missing data framework when the analyst knows the variable of interest for some customers, and has access to a reference list for data imputation for the remaining ones.
Keywords: Consumer profiling; Data augmentation; Data brokerage; Bayesian profiling; Sociodemographic profiling; First name; Age; Political partisanship; Geolocation; Missing data
JEL Classification: M3, M31, C11
Suggested Citation: Suggested Citation