How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms
60 Pages Posted: 24 Jan 2021 Last revised: 13 Dec 2022
Date Written: December 12, 2022
Abstract
Through repeated interactions, firms today refine their understanding of individual users' preferences adaptively for personalization. In this paper, we use a continuous-time bandit model to analyze firms that recommend content to multi-homing consumers, a representative setting for strategic learning of consumer preferences to maximize lifetime value. In both monopoly and duopoly settings, we compare a forward-looking recommendation algorithm that balances exploration and exploitation to a myopic algorithm that only maximizes the quality of the next recommendation. Our analysis shows that compared to a monopoly, firms competing for users' attention focus more on exploitation than exploration. When users are impatient, competition decreases the return from developing a forward-looking algorithm. In contrast, development of a forward-looking algorithm may hurt users under monopoly but always benefits users under competition. Competing firms' decisions to invest in a forward-looking algorithm can create a prisoner's dilemma. Our results have implications for AI adoption as well as for policy makers on the effect of market power on innovation and consumer welfare.
Keywords: AI, multi-agent bandit, recommendation algorithm, innovation, competition, reinforcement learning, experimentation, CLV, value of learning, forward-looking optimization
JEL Classification: C73, D40, D83, L10, M31
Suggested Citation: Suggested Citation