Research Article: Disease Gene Characterization through Large-Scale Co-Expression Analysis

Date Published: December 31, 2009

Publisher: Public Library of Science

Author(s): Allen Day, Jun Dong, Vincent A. Funari, Bret Harry, Samuel P. Strom, Dan H. Cohn, Stanley F. Nelson, Chad Creighton.

Abstract: In the post genome era, a major goal of biology is the identification of specific roles for individual genes. We report a new genomic tool for gene characterization, the UCLA Gene Expression Tool (UGET).

Partial Text: The completion of the human genome, elucidation of most protein coding genes, and development of new tools for the assessment of genomic variation and regulation, have greatly facilitated our ability to identify specific genes and gene variants involved in diverse human traits. As information accumulates, there is substantial promise that advances in biological understanding will come through integrative approaches that combine genomic data acquired from many sources [1], [2], [3], [4]. One of the largest sources of information is derived from genome-wide gene expression data made possible through academic and commercial efforts [5], [6].

Our aim was to create a tool that permits scientists to explore the data available within Celsius and demonstrate the utility of these data in human disease gene identification as a general proxy of the information within the dataset. To do this we created data matrices of gene-gene correlations and demonstrate two methods to simply mine the matrix of correlation coefficients. For these demonstrations we use the Affymetrix HG-U133_Plus_2 array design. We use in these analyses probeset gene symbol mappings available from NetAffx [34] and probeset genome alignments available from the UCSC Genome Browser [35] and exclude all other information about the microarray experiments that were performed. In the prediction of gene function we utilized human-reviewed Gene Ontology (GO) Biological Process (BP) codes, as available from Bioconductor [36], [37]. In all cases, metadata about the biological samples, sample treatments, and other conditions of the original experiments were omitted from our analyses in order to demonstrate the power of the approach in the absence of annotation data.

We describe here the creation of a new web-accessible gene-gene correlation resource, and demonstrate the power and utility of a large collection of gene expression microarray data for functional gene discovery and for prioritizing genes for mutation analysis within linkage regions.



0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments