Machine Learning for Demand Estimation in Long Tail Markets
43 Pages Posted: 17 Nov 2020
Date Written: September 29, 2020
Abstract
Random coefficient multinomial logit models (Berry et al. 1995) are widely used to estimate customer preferences from sales data. However, these estimation models can only allow for products with positive sales; this selection leads to highly biased estimates in long tail markets--i.e., markets where many products have zero or low sales. Such markets are increasingly common in areas such as online retail and other online marketplaces. In this paper, we propose a two-stage estimator that uses machine learning to correct for this bias. Our method first uses deep learning to predict the market shares of all products, where the neural network's structure mirrors the random coefficient logit model’s data generating process. In the second stage, we use the predictions of the first stage to re-weight the observed shares in a way that corrects for the induced bias and maintains the causal interpretation of the structural model. We show that the estimated parameters are consistent in the number of markets. Our method performs well on simulated long tail data, producing accurate estimates of customer behavior. These improved estimates can subsequently be used to provide prescriptive policy recommendations on important managerial decisions like pricing, assortment, or the introduction of new products.
Keywords: Demand Estimation, Machine Learning, Random Coefficients MNL, Long Tail
Suggested Citation: Suggested Citation