Scalable Data Fusion with Selection Correction: An Application to Customer Base Analysis
75 Pages Posted: 11 Apr 2019
Date Written: May 28, 2020
Increasingly, applied researchers study problems for which multiple sources of data are available. These sources may come with varying degrees of aggregation, and some of them may not be representative of the population of interest. Utilizing multiple data sources could lead to richer insights. However, existing data fusion approaches do not correct for selection bias in data sources that may not be representative, and either do not scale to large populations or are statistically inefficient. We propose an aggregate-disaggregate data fusion method which corrects for selection bias and is both computationally scalable and statistically efficient. We apply the method to estimate a model of customer acquisition and churn at subscription-based firms. We bring the model to life using a large credit card panel and public data from Spotify, the music streaming service. This application and supporting simulations show that incorporating the granular data through our data fusion method enhances identification and offers richer insights than extant approaches. We find, for example, that previously churned customers remain with Spotify longer than newly adopted subscribers do, implying a more sanguine view of Spotify's future retention profile than previous approaches which do not use multiple data sources.
Keywords: data fusion; selection correction; customer relationship management; marketing-finance interface
JEL Classification: M31; C13
Suggested Citation: Suggested Citation