How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms
57 Pages Posted: 24 Jan 2021 Last revised: 17 Aug 2022
Date Written: November 30, 2020
Through repeated interactions, firms today refine their understanding of individual users' preferences adaptively for personalization. In this paper, we use a continuous-time bandit model to analyze firms that recommend content to multi-homing consumers, a representative setting for strategic learning of consumer preferences to maximize lifetime value. In both monopoly and duopoly settings, we compare a forward-looking recommendation algorithm that balances exploration and exploitation to a myopic algorithm that only maximizes the quality of the next recommendation. Our analysis shows that firms competing for users' attention focuses more on exploitation than exploration than a monopoly does. When users are impatient, competition decreases the return from developing forward-looking algorithms. On the other hand, development of the forward-looking algorithm may hurt users under monopoly but always benefits users under competition. Competing firms' decisions to invest in the forward-looking algorithm creates a prisoner's dilemma unless the development cost is sufficiently low. Our results provide implications for AI adoption as well as for policy makers on the effect of market power on innovation and consumer welfare.
Keywords: AI, multi-agent bandit, recommendation algorithm, innovation, competition, reinforcement learning, experimentation, CLV, value of learning, forward-looking optimization
JEL Classification: C73, D40, D83, L10, M31
Suggested Citation: Suggested Citation