Date Published: August 10, 2017
Publisher: Public Library of Science
Author(s): Irina Higgins, Simon Stringer, Jan Schnupp, Gennady Cymbalyuk.
The nature of the code used in the auditory cortex to represent complex auditory stimuli, such as naturally spoken words, remains a matter of debate. Here we argue that such representations are encoded by stable spatio-temporal patterns of firing within cell assemblies known as polychronous groups, or PGs. We develop a physiologically grounded, unsupervised spiking neural network model of the auditory brain with local, biologically realistic, spike-time dependent plasticity (STDP) learning, and show that the plastic cortical layers of the network develop PGs which convey substantially more information about the speaker independent identity of two naturally spoken word stimuli than does rate encoding that ignores the precise spike timings. We furthermore demonstrate that such informative PGs can only develop if the input spatio-temporal spike patterns to the plastic cortical areas of the model are relatively stable.
The nature of the neural code used by the auditory brain to represent complex auditory stimuli, such as naturally spoken words, remains uncertain [1, 2]. A variety of spike rate and spike timing coding schemes are being debated. Rate encoding presumes that the identity of an auditory stimulus is encoded by the average firing rate of a subset of neurons, but the precise timing of individual spikes is irrelevant. Temporal encoding suggests that different auditory stimuli are represented by spatio-temporal patterns of spiking activity within populations of neurons, where the relative timing of the spikes is part of the representation.
In this paper we propose that speaker-independent word representations are encoded by unique discriminative spatio-temporal patterns of firing (PGs) in the auditory cortex. We test this hypothesis using a biologically inspired spiking neural network model of the auditory brain. Since we cannot explicitly detect the information bearing PGs due to computational restrictions, we present instead multiple pieces of evidence for the emergence of informative PGs within the plastic cortical stages of our model. These pieces of evidence include the particular change in the distribution of connection weights (wijBL) after training that is characteristic of PG-based learning, the increased polychrony of firing in the final cortical stages of the model (measured by the polychronisation index we devised in Section Polychronization Index), and the performance of MLP decoders trained to be sensitive to the existence of stimulus class selective PGs. The observed differences according to these measures between the full AN-CN-IC-CX and the reduced AN-CX models, and between rate and temporal encoding schemes, provide evidence in support of our hypothesis that more information is carried using temporal PGs rather than rate codes, and that the emergence of such informative PGs is only possible if stable input firing patterns are provided to the plastic cortical stages of our models.
In this paper we argued that a hierarchy of speaker-independent informative PGs is learnt within the different stages of the plastic cortical layers of the full AN-CN-IC-CX model. The learning in the model, however, is reliant on the input of stable firing patterns to the plastic cortical stages A1 and Belt. Such stable firing patterns are obscured by stochasticity in the raw AN firing rasters . Consequently the cortical layers are essentially unable to learn speaker independent representations of naturally spoken words using unprocessed AN input (reduced AN-CX model). Subcortical preprocessing in the CN and IC stabilises and de-noises the AN firing patterns, thus allowing the cortical ensembles of the full AN-CN-IC-CX model to form category specific response patterns.