How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms
50 Pages Posted: 24 Jan 2021
Date Written: November 30, 2020
Through repeated interactions with users, firms today refine their understanding of individual users' preferences adaptively for personalized targeting and recommendation. In this paper, we use a continuous-time bandit model to analyze firms that supply content to consumers, a representative setting for strategic learning of consumer preferences to maximize lifetime value. We compare a forward-looking recommendation algorithm that balances exploration and exploitation to a myopic algorithm that only maximizes the current quality of the recommendation in both monopoly and duopoly settings. Our analysis shows that competition can discourage learning. In a duopoly where firms compete for consumers' attention, firms focus more on exploitation than exploration in their recommendations than a monopoly would. Competition increases firms' incentives to develop myopic algorithms but decreases their incentives to develop forward-looking algorithms when users are impatient. Development of the optimal forward-looking algorithm may hurt users under monopoly but benefits users under competition. We are among the first to examine and compare the equilibrium of this multi-agent bandit problem under different competitive scenarios, and our results provide implications for firms on the adoption of AI strategy as well as for policy makers on the effect of market power on innovation and consumer welfare.
Keywords: AI, multi-agent bandit, recommendation algorithm, innovation, competition, reinforcement learning, experimentation, CLV, value of learning, forward-looking optimization
JEL Classification: C73, D40, D83, L10, M31
Suggested Citation: Suggested Citation