Using Aggregate-Disaggregate Data Fusion to Forecast the Inflow and Outflow of Customers
64 Pages Posted: 11 Apr 2019
Date Written: March 14, 2019
When forecasting the number of customers a firm will acquire and lose, some analysts rely upon aggregated data provided by the firm, while others rely upon granular panel data provided by third parties. There are benefits and limitations to both data sources, so a natural idea would be to combine them, with the prospect of obtaining the benefits of both while mitigating their limitations. Fusing them together in a valid manner, however, is complicated for several reasons: the data sources operate at differing levels of granularity, the third-party data's panel members may be a non-representative sample, and both data sources may be censored and/or truncated. This issue is particularly severe when forecasting the inflow and outflow of customers, because target populations are often very large (e.g., over one billion) and outcomes are high-dimensional. We propose a computationally scalable estimator for this data structure which maximizes a "proxy likelihood" function that asymptotically approximates the model likelihood function. Under mild regularity conditions, our estimator achieves consistency and asymptotic normality. We apply this estimator to data from Spotify, a music streaming service. Through this application and supporting simulations, we show that incorporating third-party panel data significantly improves predictive validity over simpler methods.
Keywords: data fusion, prediction, marketing, finance
JEL Classification: M31, C13
Suggested Citation: Suggested Citation