Self-Organization of Policy by Symmetric Reasoning and its Application to Reinforcement Learning

Abstract

The real limitations to our cognition and our locality lead to the exploration-exploitation dilemma and hence the tradeoff between speed and accuracy. Considering that all creatures especially animals efficiently deal with the tradeoff, it is natural to suppose we can intuitively handle the dilemma. We adopt the loosely symmetric (LS) formula as a toy model of our intuitive judgment. LS, a kind of biased conditional probability, is known to precisely describe our probability judgment and to go beyond the ordinary tradeoff of speed and accuracy in two-armed bandit problems. In this study, we give analyses of LS in relation to its information theoretical and logical-probabilistic nature. Along with the analyses, we extend LS from a conditional probability to a general value function. The efficacy of LS in decision-making under risk and reinforcement learning is proven by experiments and simulations.


Back to Table of Contents