Learning and decisions in contextual multi-armed bandit tasks

Abstract

Contextual Multi-Armed Bandit (CMAB) tasks are a novel framework to assess decision making in uncertain environments. In a CMAB task, participants are presented with multiple options (arms) which are characterized by a number of features (context) related to the reward associated with the arms. By choosing arms repeatedly and observing the reward, participants can learn about the relation between context and reward and improve their decision strategy. We present two studies on how people behave in CMAB tasks. Within a stationary environment, we find that participants are best described by Thompson Sampling-based Gaussian Process models. In a dynamic CMAB task we again find that participants are best described by probability matching of Gaussian Process expectations. Our findings imply that behavior previously referred to as "irrational" can actually be seen as a well-adapted strategy based on powerful inference algorithms.


Back to Table of Contents