Scalable Data Fusion with Selection Correction: An Application to Customer Base Analysis

75 Pages Posted: 11 Apr 2019 Last revised: 13 Dec 2022

See all articles by Daniel McCarthy

Daniel McCarthy

Emory University - Department of Marketing

Shin Oblander

Columbia Business School

Date Written: May 28, 2020


Increasingly, applied researchers study problems for which multiple sources of data are available. These sources may come with varying degrees of aggregation, and some of them may not be representative of the population of interest. Utilizing multiple data sources could lead to richer insights. However, existing data fusion approaches do not correct for selection bias in data sources that may not be representative, and either do not scale to large populations or are statistically inefficient. We propose an aggregate-disaggregate data fusion method which corrects for selection bias and is both computationally scalable and statistically efficient. We apply the method to estimate a model of customer acquisition and churn at subscription-based firms. We bring the model to life using a large credit card panel and public data from Spotify, the music streaming service. This application and supporting simulations show that incorporating the granular data through our data fusion method enhances identification and offers richer insights than extant approaches. We find, for example, that previously churned customers remain with Spotify longer than newly adopted subscribers do, implying a more sanguine view of Spotify's future retention profile than previous approaches which do not use multiple data sources.

Keywords: data fusion; selection correction; customer relationship management; marketing-finance interface

JEL Classification: M31; C13

Suggested Citation

McCarthy, Daniel and Oblander, Shin, Scalable Data Fusion with Selection Correction: An Application to Customer Base Analysis (May 28, 2020). Marketing Science, 40(3), 459-480, Available at SSRN: or

Daniel McCarthy (Contact Author)

Emory University - Department of Marketing ( email )

Goizueta Business School
1300 Clifton Road
Atlanta, GA 30322
United States

Shin Oblander

Columbia Business School ( email )

New York, NY 10027
United States


Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics