Date Published: November 16, 2016
Publisher: Public Library of Science
Author(s): Johannes Zschache, Long Wang.
Melioration learning is an empirically well-grounded model of reinforcement learning. By means of computer simulations, this paper derives predictions for several repeatedly played two-person games from this model. The results indicate a likely convergence to a pure Nash equilibrium of the game. If no pure equilibrium exists, the relative frequencies of choice may approach the predictions of the mixed Nash equilibrium. Yet in some games, no stable state is reached.
Various learning models have been analysed in the game-theoretic literature. The best known ones, such as fictitious play or Bayesian learning, describe normative processes that enable the players to find an equilibrium during the repeated play of a game . Those models presume that information about the preferences and past actions of all players is available. More recently, researchers have evaluated whether equilibria can be reached without knowing the preferences of other players  or even without considering the other players’ presence . The latter condition was called radically or completely uncoupled learning.
Established by Herrnstein and Vaughan , melioration learning is a theory of individual decision-making from behavioural psychology. It was introduced as explanation of the matching law , which describes an often observed regularity of individual behaviour [13–23]. In the past, many empirical studies have validated the predictions of melioration learning [24–31].
Algorithms 1 and 2 were applied to different two-person games by means of agent-based simulations. The simulations were implemented in NetLogo . All games are presented in normal-form. The two players, which are also called agents, are labelled by “x” and “y”. Capitalised letters or integers depict the alternatives. The following rules specify the simulations.
A simple process of completely uncoupled learning was investigated. It differs from previous models such as regret-testing or trial-and-error learning because, on the one hand, it is derived from empirical research and, on the other hand, the convergence to equilibrium states in social interactions is not guaranteed.