Research Article: Stimulus-Dependent Adjustment of Reward Prediction Error in the Midbrain

Date Published: December 2, 2011

Publisher: Public Library of Science

Author(s): Hiromasa Takemura, Kazuyuki Samejima, Rufin Vogels, Masamichi Sakagami, Jiro Okuda, Jan Lauwereyns. http://doi.org/10.1371/journal.pone.0028337

Abstract

Previous reports have described that neural activities in midbrain dopamine areas are sensitive to unexpected reward delivery and omission. These activities are correlated with reward prediction error in reinforcement learning models, the difference between predicted reward values and the obtained reward outcome. These findings suggest that the reward prediction error signal in the brain updates reward prediction through stimulus–reward experiences. It remains unknown, however, how sensory processing of reward-predicting stimuli contributes to the computation of reward prediction error. To elucidate this issue, we examined the relation between stimulus discriminability of the reward-predicting stimuli and the reward prediction error signal in the brain using functional magnetic resonance imaging (fMRI). Before main experiments, subjects learned an association between the orientation of a perceptually salient (high-contrast) Gabor patch and a juice reward. The subjects were then presented with lower-contrast Gabor patch stimuli to predict a reward. We calculated the correlation between fMRI signals and reward prediction error in two reinforcement learning models: a model including the modulation of reward prediction by stimulus discriminability and a model excluding this modulation. Results showed that fMRI signals in the midbrain are more highly correlated with reward prediction error in the model that includes stimulus discriminability than in the model that excludes stimulus discriminability. No regions showed higher correlation with the model that excludes stimulus discriminability. Moreover, results show that the difference in correlation between the two models was significant from the first session of the experiment, suggesting that the reward computation in the midbrain was modulated based on stimulus discriminability before learning a new contingency between perceptually ambiguous stimuli and a reward. These results suggest that the human reward system can incorporate the level of the stimulus discriminability flexibly into reward computations by modulating previously acquired reward values for a typical stimulus.

Partial Text

Reward prediction is an important function used by humans and animals to make appropriate decisions in various environments. Humans and animals learn whether the sensory information of incoming stimuli is rewarding or harmful through stimulus–reward experiences. Previous reports have described that reward prediction error (the difference between the predicted reward value and obtained reward outcome) occurs when updating reward prediction associated with sensory stimuli. Schultz and colleagues described that the activity of dopamine neurons in monkey midbrain areas (ventral tegmental area, VTA, and substantia nigra) is correlated strongly with reward prediction error [1], [2], [3], [4]. Human neuroimaging studies have demonstrated that fMRI signals in the midbrain and basal ganglia are correlated with reward prediction error [5], [6], [7], [8]. Computational studies have described these reward prediction error activities using reinforcement learning models such as the Rescorla–Wagner model and the temporal difference (TD) model [2], [7], [9], [10], [11], [12], [13], [14]. These results suggest that the reward prediction error signal is represented in the midbrain dopamine neurons and that it is used for updating the association between reward prediction and sensory stimuli.

These results demonstrated that the neural activity in the midbrain is correlated significantly with the reward prediction error in the reinforcement learning model including the factor of stimulus discriminability level (WITH model). This correlation was significantly higher than that obtained with a model without the factor of stimulus discriminability (WITHOUT model). Higher correlation with the WITH model was observed consistently for wide range of learning rates we tested, and no area showed higher correlation with the reward prediction error in the WITHOUT model than that with the WITH model. Furthermore, such a difference of correlation between models appeared from the first session of the experiment. Taken together, these results support the view that the human reward system can incorporate a level of discriminability of perceptually degraded stimuli for calculating the reward prediction error, by adaptively modulating already-acquired reward values for distinctive stimuli according to the stimulus discriminability information related to a stimulus-by-stimulus basis.

Source:

http://doi.org/10.1371/journal.pone.0028337