Finding Best Answers for the Iterated Prisoner’s Dilemma Using Improved Q-Learning
177 Pages Posted: 9 Apr 2020
Date Written: March 18, 2020
Given an arbitrary black-box strategy for the Iterated Prisoner’s Dilemma game, it is often difficult to gauge to which extent it can be exploited by other strategies. In the presence of imperfect public monitoring and resulting observation errors, deriving a theoretical solution is even more time-consuming. However, for any strategy the reinforcement learning algorithm Q-Learning can construct a best response in the limit case. In this article I present and discuss several improvements to the Q-Learning algorithm, allowing for an easy numerical measure of the exploitability of a given strategy. Additionally, I give a detailed introduction to reinforcement learning aimed at economists.
Keywords: Iterated Prisoner’s Dilemma, Repeated Prisoner’s Dilemma, Imperfect Public Monitoring, Reinforcement Learning, Q-Learning, Neural Networks, Gradient Boosting, Machine Learning
JEL Classification: C61, C63, C72, C73
Suggested Citation: Suggested Citation