Exploration Optimization for Dynamic Assortment Personalization under Linear Preferences
40 Pages Posted: 31 May 2022
Date Written: April 27, 2022
We study efficient real-time data collection for an online retailer that dynamically personalizes assortments based on customers’ attributes to learn their preferences and maximize revenue. Prior work on personalization in the operations management and marketing literature generally assumes a linear relationship between product utilities and customer attributes. An important implication of this assumption, which has not received much attention in the literature, is that one can infer a customer’s preference for a product using transaction data from other customers. In other words, demand learning can be shared across customer profiles. We leverage this insight to study the structure of efficient exploration in an online assortment personalization setting. We prove a lower bound on the asymptotic regret of any admissible policy and show that not all products and customer profiles need to be explored in order to estimate customer demand. We apply this insight to design efficient learning policies. In particular, we propose adaptive learning policies that solve a linear mixed integer program, called the exploration-optimization problem, to identify an efficient exploration set which determines what assortments to display to which customer profiles. To illustrate the practical value of the proposed policies, we consider a setting calibrated using a dataset from a large Chilean retailer. We compare the performance of our policies to that of Thompson sampling and show that there is a significant gain from using our proposed policies as they focus exploration efforts on the “right” subset of products and customer profiles.
Keywords: Dynamic Assortment Planning, Personalization, Online Retailing, Multi-Armed Bandit
Suggested Citation: Suggested Citation