42 Pages Posted: 26 Sep 2016
Date Written: August 08, 2016
The literature on learning in unknown environments emphasises reinforcing on actions which produce positive results. But, in some cases, success requires shifting from a currently successful actions to others. We examine, experimentally and theoretically in a very simple framework, how individuals initially learn by exploiting information from the pay-offs of actions taken but also from exploring new actions. We analyse if and how they learn that pay-offs are inter-temporally dependent. We then ran the same experiments but where individuals could observe the actions taken or the pay-offs obtained by others or both. Such observations improved pay-offs if one of the pair had learned to obtain the maximum pay-off.
Keywords: multi-armed bandit, reinforcement learning, eureka moment, pay-off patterns, observational learning
JEL Classification: D810, D830
Suggested Citation: Suggested Citation
Hanaki, Nobuyuki and Kirman, Alan P. and Pezanis‐Christou, Paul, Counter Intuitive Learning: An Exploratory Study (August 08, 2016). CESifo Working Paper Series No. 6029. Available at SSRN: https://ssrn.com/abstract=2843476