Research Article: Expression Quantitative Trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue

Date Published: February 2, 2017

Publisher: Public Library of Science

Author(s): Alejandro Quiroz-Zárate, Benjamin J. Harshfield, Rong Hu, Nick Knoblauch, Andrew H. Beck, Susan E. Hankinson, Vincent Carey, Rulla M. Tamimi, David J. Hunter, John Quackenbush, Aditi Hazra, Kwong-Kwok Wong.


We investigate 71 single nucleotide polymorphisms (SNPs) identified in meta-analytic studies of genome-wide association studies (GWAS) of breast cancer, the majority of which are located in intergenic or intronic regions. To explore regulatory impacts of these variants we conducted expression quantitative loci (eQTL) analyses on tissue samples from 376 invasive postmenopausal breast cancer cases in the Nurses’ Health Study (NHS) diagnosed from 1990–2004. Expression analysis was conducted on all formalin-fixed paraffin-embedded (FFPE) tissue samples (and on 264 adjacent normal samples) using the Affymetrix Human Transcriptome Array. Significance and ranking of associations between tumor receptor status and expression variation was preserved between NHS FFPE and TCGA fresh-frozen sample sets (Spearman r = 0.85, p<10^-10 for 17 of the 21 Oncotype DX recurrence signature genes). At an FDR threshold of 10%, we identified 27 trans-eQTLs associated with expression variation in 217 distinct genes. SNP-gene associations can be explored using an open-source interactive browser distributed in a Bioconductor package. Using a new a procedure for testing hypotheses relating SNP content to expression patterns in gene sets, defined as molecular function pathways, we find that loci on 6q14 and 6q25 affect various gene sets and molecular pathways (FDR < 10%). Although the ultimate biological interpretation of the GWAS-identified variants remains to be uncovered, this study validates the utility of expression analysis of this FFPE expression set for more detailed integrative analyses.

Partial Text

Genome-wide association studies (GWAS) of breast cancer have identified at least 71 risk alleles[1–3]. The majority of these single nucleotide polymorphisms (SNPs) are in intergenic or intronic regions. However, determining the target gene or biological pathway associated with these germline risk loci in breast tissue has remained a challenge. Identification of expression quantitative loci (eQTLs) associated with these SNPs may help us to better understand the mechanisms by which these risk variants influence breast cancer susceptibility. Previous eQTL studies evaluated a subset of these SNPs[4,5], using breast cancer cell lines, lymphoblastoid cell lines, reduction mammoplasty samples[6] or fresh frozen breast tissue from The Cancer Genome Atlas (TCGA)[7,8]. Although formalin fixed paraffin embedded (FFPE) tissue is the most common type of tumor tissue collected in the clinic, no comprehensive eQTL analyses of the 71 SNPs have been reported in FFPE breast tumor and tumor adjacent normal tissue specimens.

We analyzed QTL data from 376 postmenopausal invasive breast tumor specimens and 264 tumor adjacent normal specimens derived from an initial pool of 867 HTA 1.0 CEL files (Fig 1). The mean age at breast cancer diagnosis was 57 years and mean year of diagnosis was 1994. 262 (70%) of the breast cancer tumors with analyzable expression arrays were documented as ER positive (ER+). ER status in medical reports was used to update 44 (12%) specimens with missing data on estrogen receptor status from the expression assay.

At a SNP-specific false discovery rate threshold of 10% in separate analyses of ER+, ER-, and tumor-adjacent normal paired samples, 27 of 71 meta-analytically identified breast cancer risk SNP exhibited association with mean expression of at least one HTA 1.0 transcript cluster. The total number of genes significantly associated with these SNPs in trans is 217. Five SNPs exhibited association with mean expression of gene sets defined using Gene Ontology molecular function categories; three of these fQTL did not show significant association with any transcript cluster in SNP-gene association testing. This analysis therefore distinguishes a total of 30 of 71 breast cancer risk SNPs as potentially acting through effects on gene expression. Tables 1 and 2 indicate that the majority of these SNPs are in intergenic or intronic regions.