Learning Personalized Product Recommendations with Customer Disengagement
50 Pages Posted: 13 Sep 2018 Last revised: 29 Dec 2019
Date Written: August 29, 2018
We consider the problem of sequential product recommendation when customer preferences are unknown. First, we present empirical evidence of customer disengagement using a sequence of ad campaigns from a major airline carrier. In particular, customers decide to stay on the platform based on the relevance of recommendations. We then formulate this problem as a linear bandit, with the notable difference that the customer's horizon length is a function of past recommendations. We prove that any algorithm in this setting achieves linear regret. Thus, no algorithm can keep all customers engaged; however, we can hope to keep a subset of customers engaged. Unfortunately, we find that classical bandit learning as well as greedy algorithms provably over-explore, thereby incurring linear regret for every customer. We propose modifying bandit learning strategies by constraining the action space upfront using an integer program. We prove that this simple modification allows our algorithm to achieve sublinear regret for a significant fraction of customers. Furthermore, numerical experiments on real movie recommendations data demonstrate that our algorithm can improve customer engagement with the platform by up to 80%.
Keywords: Bandits, Online Learning, Recommendation Systems, Disengagement
Suggested Citation: Suggested Citation