A Case of Divergent Predictions Made by Delta and Decay Rule Learning Models
- Darrell Worthy, Department of Psychological & Brain Sciences, Texas A&M University, College Station, Texas, United States
- A. Ross Otto, McGill University, Montréal, Quebec, Canada
- Astin Cornwall, Texas A&M University, College Station, Texas, United States
- Hilary Don, The University of Sydney, Sydney, NSW, Australia
- Tyler Davis, Psychological Sciences, Texas Tech University, Lubbock, Texas, United States
AbstractThe Delta and Decay rules are two learning rules used to update expected values in reinforcement learning (RL) models. The delta rule learns average rewards, whereas the decay rule learns cumulative rewards for each option. Participants learned to select between pairs of options that had reward probabilities of .65 (option A) versus .35 (option B) or .75 (option C) versus .25 (option D) on separate trials in a binary-outcome choice task. Crucially, during training there were twice as AB trials as CD trials, therefore participants experienced more cumulative reward from option A even though option C had a higher average reward rate (.75 versus .65). Participants then decided between novel combinations of options (e.g, A versus C). The Decay model predicted more A choices, but the Delta model predicted more C choices, because those respective options had higher cumulative versus average reward values. Results were more in line with the Decay model’s predictions. This suggests that people may retrieve memories of cumulative reward to compute expected value instead of learning average rewards for each option.
Return to previous page