Date Published: July 6, 2017
Publisher: Public Library of Science
Author(s): Daniel Rasmussen, Aaron Voelker, Chris Eliasmith, Gennady Cymbalyuk.
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain’s general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model’s behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.
One of the basic problems brains must solve is how to achieve good outcomes in unfamiliar environments. A rat trying to navigate a maze, a bird trying to decide where to forage, or a human trying to impress a new boss—all are faced with the problems of being in an unknown environment, having no clear indication of how to achieve their target, and executing a potentially lengthy sequence of decisions in order to achieve their goals.
We begin by discussing standard (non-hierarchical) reinforcement learning models, as several new developments incorporated in the NHRL model also address open issues there. We then discuss the much more sparsely populated domain of neural HRL models.
We have divided the structure of the NHRL model into three main components, which we term action values, action selection, and error calculation (shown in Fig 1). We begin by discussing each of these components in turn, and show how they implement their respective aspects of reinforcement learning. Together these components form a flat, non-hierarchical system. Although the underlying design decisions were made with the needs of a hierarchical system in mind (e.g., SMDP processing), this aspect of the model can be understood without any reference to HRL. After the basic model is presented, we then show how these elements can be composed into a hierarchical structure.
The goals of this section are threefold. First and foremost, the purpose of these results is to demonstrate that the model works—that the action values, action selection, and error calculation components all perform the functions described above, and that together they are able to carry out the hierarchical reinforcement learning process. The second goal is to demonstrate that the model’s performance is consistent with neurophysiological data, in order to further support its biological plausibility. And third, we seek to demonstrate the advantages of hierarchical learning, in order to show the benefit of including these features in models of decision making.
In this work we have presented the first model to provide a detailed neural implementation of hierarchical RL. This model is able to perform HRL while incorporating important constraints of a biological system, such as local information transfer, continuous environments, temporally extended action sequences, and noisy/heterogeneous/imprecise components. By overcoming the challenges of these more general environments, the NHRL model brings us closer to understanding the complex performance of real brains, for which these challenges are the norm. More specifically, this provides important evidence that the abstract computations of HRL can be adapted so as to be plausibly implemented in real brains.