Date Published: August 16, 2017
Publisher: Public Library of Science
Author(s): Malka Gorfine, Sonja I. Berndt, Jenny Chang-Claude, Michael Hoffmeister, Loic Le Marchand, John Potter, Martha L. Slattery, Nir Keret, Ulrike Peters, Li Hsu, Zhi Wei.
The popular Genome-wide Complex Trait Analysis (GCTA) software uses the random-effects models for estimating the narrow-sense heritability based on GWAS data of unrelated individuals without knowing and identifying the causal loci. Many methods have since extended this approach to various situations. However, since the proportion of causal loci among the variants is typically very small and GCTA uses all variants to calculate the similarities among individuals, the estimation of heritability may be unstable, resulting in a large variance of the estimates. Moreover, if the causal SNPs are not genotyped, GCTA sometimes greatly underestimates the true heritability. We present a novel narrow-sense heritability estimator, named HERRA, using well-developed ultra-high dimensional machine-learning methods, applicable to continuous or dichotomous outcomes, as other existing methods. Additionally, HERRA is applicable to time-to-event or age-at-onset outcome, which, to our knowledge, no existing method can handle. Compared to GCTA and LDAK for continuous and binary outcomes, HERRA often has a smaller variance, and when causal SNPs are not genotyped, HERRA has a much smaller empirical bias. We applied GCTA, LDAK and HERRA to a large colorectal cancer dataset using dichotomous outcome (4,312 cases, 4,356 controls, genotyped using Illumina 300K), the respective heritability estimates of GCTA, LDAK and HERRA are 0.068 (SE = 0.017), 0.072 (SE = 0.021) and 0.110 (SE = 5.19 x 10−3). HERRA yields over 50% increase in heritability estimate compared to GCTA or LDAK.
Heritability is a concept that summarizes the proportion of phenotypic variance that is due to genetic factors, with broad-sense heritability referring to genetic variation that may include effects due to additive genetic variation as well as dominance and epistasis, and narrow-sense heritability, h2, referring to additive genetic variation only . Breakthroughs in high throughput technologies have enables researchers to conduct large-scale genome-wide associate studies for many complex diseases. A question of key interest is to estimate the (narrow-sense) heritability from the genome-wide genotyped data and have an overall assessment of the extent of genetic components associated with complex traits, providing guidance for future discoveries of genetic loci.
We provided simple, efficient, and consistent estimators (see Supporting Information for consistency proofs) of the narrow-sense heritability based on GWAS data, for a continuous, categorical or age-at-onset outcome where covariates can be readily incorporated. We showed, by simulation, that HERRA provides essentially unbiased results even if the causal SNPs are not genotyped, in contrast to GCTA’s and LDAK’s estimator. For age-at-onset outcome, we are the first to provide a narrow-sense heritability estimator based on GWAS data of unrelated individuals. The analysis of the case-control GECCO data demonstrates that the heritability estimates of GCTA, LDAK and HERRA could be substantially different.