Research Article: Identification of genetic markers for cortical areas using a Random Forest classification routine and the Allen Mouse Brain Atlas

Date Published: September 4, 2019

Publisher: Public Library of Science

Author(s): Natalie Weed, Trygve Bakken, Nile Graddis, Nathan Gouwens, Daniel Millman, Michael Hawrylycz, Jack Waters, Joseph Najbauer.


The mammalian neocortex is subdivided into a series of cortical areas that are functionally and anatomically distinct and are often distinguished in brain sections using histochemical stains and other markers of protein expression. We searched the Allen Mouse Brain Atlas, a database of gene expression, for novel markers of cortical areas. To screen for genes that change expression at area borders, we employed a random forest algorithm and binary region classification. Novel genetic markers were identified for 19 of 39 areas and provide code that quickly and efficiently searches the Allen Mouse Brain Atlas. Our results demonstrate the utility of the random forest algorithm for cortical area classification and we provide code that may be used to facilitate the identification of genetic markers of cortical and subcortical structures and perhaps changes in gene expression in disease states.

Partial Text

The mammalian neocortex is classified into a series of anatomically and functionally distinct regions or cortical areas [1,2]. Areas are often identified using histochemical stains and antibodies to visualize differences in protein expression across cortex. Examples include cytochrome oxidase histochemistry and antibodies against m2 muscarinic receptors [3]. Numerous differences in expression across cortical areas have been observed, including abrupt changes in expression at area borders, more graded changes between areas, gradients in expression across an area, and changes in cell-specific expression [4–11].

Our aim was to identify genes with changes in expression at the borders of cortical areas in the mouse. From the Allen Mouse Brain Atlas, we took coronal in situ hybridization (ISH) data resampled to a canonical 3D reference space and overlaid the borders of cortical regions from the Allen Mouse Brain Reference Atlas, version 3. To identify genes with differential expression along these boundaries, we used a Random Forest algorithm, implemented in Python using the scikit-learn package.

We used a Random Forest algorithm to identify a short list of potential gene markers from thousands of candidate genes, applying this approach to 39 cortical regions in the mouse. Our results identified 44 putative markers, marking 19 of the explored regions.




Leave a Reply

Your email address will not be published.