Robust Learning of Consumer Preferences
76 Pages Posted: 7 Aug 2018 Last revised: 10 Feb 2020
Date Written: February 10, 2020
This paper studies a class of ranking and selection problems faced by a company that wants to identify the most preferred product out of a finite set of alternatives when consumer preferences are a priori unknown. The only information available is that consumer preferences satisfy two key properties: (i) they are consistent with some unknown true ranking of the alternatives and (ii) they are strict, namely, no two products are equally preferred. To learn the unknown ranking, the company is able to sample consumer preferences by sequentially showing different subsets of products to different consumers and asking them to report their top preference within the displayed set. The objective of the company is to design a display policy that minimizes the expected number of samples needed to identify the top-ranked product with high probability. We prove an instance-specific lower bound on the sample complexity of any policy that identifies the top-ranked version within a given (probabilistic) confidence. We also propose a robust formulation of the company's problem and derive a sampling policy (Myopic Tracking Policy), which is both worst-case asymptotically optimal and intuitive to implement. Roughly speaking, the Myopic Tracking Policy randomly alternates between two extreme types of displaying strategies: (i) full display that shows a consumer the entire menu so as to learn something about every version and (ii) pair display that shows a consumer only two versions so as to maximize the informativeness of the choice made by the consumer. To assess the performance of our proposed Myopic Tracking Policy, we conduct a comprehensive set of computational studies and compare it to alternative methods in the literature.
Keywords: sequential learning, maximum selection, best arm identification, dynamic assortments, preference learning
Suggested Citation: Suggested Citation