Human behavior in contextual multi-armed bandit problems

Abstract

In real-life decision environments people learn from their direct experience with alternative courses of action. Yet they can accelerate their learning by using functional knowledge about the features characterizing the alternatives. We designed a novel contextual multi-armed bandit task where decision makers chose repeatedly between multiple alternatives characterized by two informative features. We compared human behavior in the contextual task with a classical multi-armed bandit where decision makers did not have access to feature information. Behavioral analysis showed that participants in the contextual bandit used features to direct their exploration for promising alternatives. Ex post, we tested the participants functional knowledge in one shot multi-feature choice trilemmas. We computationally modeled the behavior of the participants and compared a novel function learning based reinforcement learning model with classical reinforcement learning. Although classical reinforcement learning models predict behavior better in the bandit experiment, new models do better in predicting the trilemma choices.


Back to Table of Contents