Leveraging Scalable Data Fusion to Enhance Customer Base Predictions
54 Pages Posted: 11 Apr 2019 Last revised: 30 Sep 2019
Date Written: July 23, 2019
When modeling the acquisition and retention behavior of a firm's customer base, some analysts rely upon aggregated data provided by the firm, while others rely upon granular panel data provided by third parties, such as credit card panels. Leveraging both sources of data simultaneously could allow for better predictions and richer customer base insights. However, existing approaches for aggregate-disaggregate data fusion are difficult to use in this context for several reasons: the panel may not be representative of the customer base as a whole, both data sources may suffer from missingness, and the target population may be very large as it represents all potential customers of a company. We propose an aggregate-disaggregate data fusion method which is computationally scalable to massive populations and allows for censored and/or truncated, non-representative panel data. We apply our method to data from Spotify, a music streaming service. By incorporating credit card panel data with our data fusion method, we obtain better predictions and richer insights than prior work that only made use of aggregate data. In particular, we predict future aggregated metrics more accurately and separate out the initial versus repeat behavior of customers, providing deeper insight into acquisition and retention dynamics driving growth.
Keywords: data fusion; missing data; customer acquisition; customer retention; marketing-finance
JEL Classification: M31; C13
Suggested Citation: Suggested Citation