Sequential Choice Bandits: Learning with Marketing Fatigue
49 Pages Posted: 8 Apr 2019 Last revised: 25 Nov 2019
Date Written: March 18, 2019
Motivated by the observation that overexposure to unwanted marketing activities can lead to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the platform due to marketing fatigue. We propose a novel sequential choice model to capture multiple interactions taking place between the platform and its users: upon receiving a message, a user decides on whether to accept or reject the message. If she chooses to reject, she would then decide to either receive the next message in the sequence or abandon the platform. Based on user feedback, the platform dynamically learns users' abandonment distribution and the relevance of the recommended content. With a goal to maximize the cumulative payoff over a horizon of length T, the platform dynamically adjusts the sequence of messages and the order in which the messages are shown to a user. We refer to this online learning task as the sequential choice bandit (SC-Bandit) problem. For the offline combinatorial optimization problem, we show a polynomial-time algorithm. For the online problem, we consider two variants, depending on whether contexts are included, and propose algorithms that balance exploration and exploitation. Lastly, we evaluate the performance of our algorithms with both synthetic and real-world datasets.
Keywords: sequential choice, learning to rank, marketing fatigue, online learning, bandit
Suggested Citation: Suggested Citation