Finding Best Answers for the Iterated Prisoner’s Dilemma Using Improved Q-Learning

177 Pages Posted: 9 Apr 2020

See all articles by Martin Kies

Martin Kies

LeverageData GmbH; Ulm University

Date Written: March 18, 2020

Abstract

Given an arbitrary black-box strategy for the Iterated Prisoner’s Dilemma game, it is often difficult to gauge to which extent it can be exploited by other strategies. In the presence of imperfect public monitoring and resulting observation errors, deriving a theoretical solution is even more time-consuming. However, for any strategy the reinforcement learning algorithm Q-Learning can construct a best response in the limit case. In this article I present and discuss several improvements to the Q-Learning algorithm, allowing for an easy numerical measure of the exploitability of a given strategy. Additionally, I give a detailed introduction to reinforcement learning aimed at economists.

Keywords: Iterated Prisoner’s Dilemma, Repeated Prisoner’s Dilemma, Imperfect Public Monitoring, Reinforcement Learning, Q-Learning, Neural Networks, Gradient Boosting, Machine Learning

JEL Classification: C61, C63, C72, C73

Suggested Citation

Kies, Martin, Finding Best Answers for the Iterated Prisoner’s Dilemma Using Improved Q-Learning (March 18, 2020). Available at SSRN: https://ssrn.com/abstract=3556714 or http://dx.doi.org/10.2139/ssrn.3556714

Martin Kies (Contact Author)

LeverageData GmbH ( email )

Wagnerstr. 18
Ulm, 89077
Germany
+4973129879770 (Phone)

Ulm University ( email )

Helmholtzstr. 18
Ulm, Baden-Württemberg 89081
Germany
7315015367 (Phone)

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
23
Abstract Views
120
PlumX Metrics