Observational Learning of Exploration-Exploitation Strategies in Bandit Tasks

66 Pages Posted: 20 Feb 2024

Abstract

Situations requiring a balance between gathering new information or exploiting known options (i.e., involving an exploration-exploitation trade-off) are pervasive. While navigating this trade-off, individuals frequently have the chance to observe and learn from others engaged in the same task. However, so far it is unclear when and from whom people will copy in exploration-exploitation tasks and whether they rely on imitation of the observed agent’s choices or use the knowledge gained by observation to emulate the other players’ strategy. In two experiments, participants performed several nine-armed bandit tasks, either on their own or while seeing the choices of a fictitious agent using either an explorative or an equally successful exploitative strategy. Subject-level parameters for copying and exploration were extracted from the data using a customized model-based reinforcement learning model. We find evidence that the inclination of people to copy depends on the certainty derived from their individually acquired knowledge. In addition, cognitive modeling provided support that people rely on both types of observational learning: Imitation of the observed agents’ choices and adjusting their own exploration strategy towards the observed players' inclination to explore without necessarily making the same choices. Finally, participants copy rather explorative than exploitative agents. Contrary to our expectations, neither similarity nor dissimilarity of the observers’ and the observed agents’ exploration tendency is predictive of the inclination to copy. These results do not only shed light on the impact of observational learning on exploration strategies but also on humans’ processing of social and non-social information in exploration scenarios.

Keywords: exploration-exploitation trade-off, bandit task, computational modeling, observational learning, decision strategies

Suggested Citation

Danwitz, Ludwig and Helversen, Bettina von, Observational Learning of Exploration-Exploitation Strategies in Bandit Tasks. Available at SSRN: https://ssrn.com/abstract=4732127 or http://dx.doi.org/10.2139/ssrn.4732127

Ludwig Danwitz (Contact Author)

University of Bremen ( email )

Universitaetsallee GW I
Bremen, D-28334
Germany

Bettina von Helversen

University of Bremen ( email )

Universitaetsallee GW I
Bremen, D-28334
Germany

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
33
Abstract Views
154
PlumX Metrics