Doubly Adaptive Cascading Bandits with User Abandonment
35 Pages Posted: 8 Apr 2019 Last revised: 5 May 2023
Date Written: March 18, 2019
A central task of digital marketing is to promote user engagement. However, overexposure to marketing activities, especially those with irrelevant content, can result in customer dissatisfaction, and ultimately lead to user abandonment (e.g., unsubscribing from mailing lists, or deleting an app). Motivated by this phenomenon, we focus on an online learning problem where a platform interacts with its users by sending them a list of messages over time. The platform earns a reward whenever a user accepts a message, and is penalized when a user abandons after being targeted with irrelevant content. The setting extends the popular "cascading bandits" by allowing users abandonment. Moreover, instead of a single instantaneous interaction, we explicitly model multiple interactions of an individual with the platform over a period of time. Thus, we focus on a type of doubly adaptive algorithm that not only updates its learning across users but also is capable of dynamically adjusting the sequential content for a given user. We refer to the online learning task as Doubly Adaptive Cascading Bandits (DAC-Bandit). For the offline combinatorial problem, we prove a polynomial-time algorithm. For the online setting, we investigate both the non-contextual and contextual problems and quantify the performance of the proposed algorithms through regret analysis. We evaluate the numerical performance of our algorithms using both synthetic and real-world datasets. Our algorithms demonstrate strong theoretical performance guarantees and promising empirical results when compared to benchmarks.
Keywords: cascading model, doubly adaptive, learning to rank, abandonment, bandits
Suggested Citation: Suggested Citation