Exploration Optimization for Dynamic Assortment Personalization under Linear Preferences

56 Pages Posted: 31 May 2022

See all articles by Fernando Bernstein

Fernando Bernstein

Duke University

Sajad Modaresi

University of North Carolina (UNC) at Chapel Hill - Kenan-Flagler Business School

Denis Saure

University of Chile - Industrial Engineering

Date Written: June 18, 2024

Abstract

We study the dynamic assortment personalization problem of an online retailer that adaptively personalizes assortments based on customers' attributes to learn their preferences and maximize revenue. We assume that there is a linear relationship between product utilities and customer attributes which governs the customer preferences for products. The coefficient matrix characterizing this linear relationship is unknown to the retailer and, as a result, the retailer faces the classic exploration (learning preferences) vs. exploitation (earning revenue) trade-off. We show that there are price-driven and linearity-driven efficiencies that can be leveraged for exploration. Specifically, we show that not all products need to be shown to all customer profiles to recover the optimal assortments and maximize revenue. We prove an instance-dependent lower bound on the regret (i.e., expected revenue loss relative to a clairvoyant retailer) of any admissible policy. We show that the regret lower bound depends on the optimal objective value of a Regret Lower Bound (RLB) problem. Even though the RLB is a linear program, solving and using its solution in practice might be challenging as it has a complex structure and depends non-trivially on unknown (to the retailer) parameters. We therefore also consider an alternative formulation, which we call the Exploration-Optimization problem, that imposes a simple and clear structure for exploration, which is easy to interpret. We show that this problem can be formulated as a Mixed Integer Linear Program (MILP) that can be effectively solved with state-of-the-art solvers. We design efficient learning policies that identify an efficient exploration set by solving either the RLB or the Exploration-Optimization problems. Finally, we prove a regret upper bound for our proposed exploration-optimization policy to further provide theoretical support for its performance. To illustrate the practical value of the proposed policies, we consider a setting calibrated on a dataset from a large Chilean retailer. We find that, in addition to running significantly faster, our proposed policies outperform the Thompson sampling benchmark in terms of regret (revenue). We also run experiments to show that our proposed policies are scalable in practice.

Keywords: Dynamic Assortment Planning, Personalization, Multi-Armed Bandit, Online Retailing

Suggested Citation

Bernstein, Fernando and Modaresi, Sajad and Saure, Denis, Exploration Optimization for Dynamic Assortment Personalization under Linear Preferences (June 18, 2024). Available at SSRN: https://ssrn.com/abstract=4115721 or http://dx.doi.org/10.2139/ssrn.4115721

Fernando Bernstein

Duke University ( email )

100 Fuqua Drive
Durham, NC 27708-0204
United States

Sajad Modaresi (Contact Author)

University of North Carolina (UNC) at Chapel Hill - Kenan-Flagler Business School ( email )

McColl Building
Chapel Hill, NC 27599-3490
United States

Denis Saure

University of Chile - Industrial Engineering ( email )

República 701, Santiago
Chile

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
181
Abstract Views
744
Rank
337,981
PlumX Metrics