How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms
53 Pages Posted: 24 Jan 2021 Last revised: 13 Sep 2021
Date Written: November 30, 2020
Through repeated interactions, firms today refine their understanding of individual users' preferences adaptively for personalization. In this paper, we use a continuous-time multi-agent bandit model to analyze firms that supply content to consumers, a representative setting for strategic learning of consumer preferences to maximize lifetime value. In both monopoly and duopoly settings, we compare a forward-looking recommendation algorithm that balances exploration and exploitation to a myopic algorithm that only maximizes the quality of the next recommendation. Our analysis shows that firms that compete for users' attention focuses more on exploitation than exploration than a monopoly would. When users are impatient, competition decreases firms' incentives to develop forward-looking algorithms. On the other hand, development of the optimal forward-looking algorithm may hurt users under monopoly but always benefits users under competition. We are among the first to examine this multi-agent bandit problem under different competitive scenarios, and our results provide implications for AI adoption as well as for policy makers on the effect of market power on innovation and consumer welfare.
Keywords: AI, multi-agent bandit, recommendation algorithm, innovation, competition, reinforcement learning, experimentation, CLV, value of learning, forward-looking optimization
JEL Classification: C73, D40, D83, L10, M31
Suggested Citation: Suggested Citation