Research Article: Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting

Date Published: January 18, 2019

Publisher: Elsevier

Author(s): Caroline F. Wright, Ben West, Marcus Tuke, Samuel E. Jones, Kashyap Patel, Thomas W. Laver, Robin N. Beaumont, Jessica Tyrrell, Andrew R. Wood, Timothy M. Frayling, Andrew T. Hattersley, Michael N. Weedon.


More than 100,000 genetic variants are classified as disease causing in public databases. However, the true penetrance of many of these rare alleles is uncertain and might be over-estimated by clinical ascertainment. Here, we use data from 379,768 UK Biobank (UKB) participants of European ancestry to assess the pathogenicity and penetrance of putatively clinically important rare variants. Although rare variants are harder to genotype accurately than common variants, we were able to classify as high quality 1,244 of 4,585 (27%) putatively clinically relevant rare (MAF < 1%) variants genotyped on the UKB microarray. We defined as “clinically relevant” variants that were classified as either pathogenic or likely pathogenic in ClinVar or are in genes known to cause two specific monogenic diseases: maturity-onset diabetes of the young (MODY) and severe developmental disorders (DDs). We assessed the penetrance and pathogenicity of these high-quality variants by testing their association with 401 clinically relevant traits. 27 of the variants were associated with a UKB trait, and we were able to refine the penetrance estimate for some of the variants. For example, the HNF4A c.340C>T (p.Arg114Trp) (GenBank: NM_175914.4) variant associated with diabetes is <10% penetrant by the time an individual is 40 years old. We also observed associations with relevant traits for heterozygous carriers of some rare recessive conditions, e.g., heterozygous carriers of the ERCC4 c.2395C>T (p.Arg799Trp) variant that causes Xeroderma pigmentosum were more susceptible to sunburn. Finally, we refute the previous disease association of RNF135 in developmental disorders. In conclusion, this study shows that very large population-based studies will help refine our understanding of the pathogenicity of rare genetic variants.

Partial Text

One of the ongoing challenges in genetic medicine is that of variant interpretation. Many variants and genes have been erroneously associated with disease as a result of study design problems (including ascertainment bias and inadequate cohort size),1, 2, 3 as well as biological phenomena such as genetic heterogeneity, reduced penetrance, variable expressivity, composite phenotypes, pleiotropy, and epistasis.4, 5, 6, 7, 8, 9, 10, 11, 12, 13 These issues have resulted in ambiguity over how to interpret clinically ascertained variants found in individuals with no known family history or symptoms of the disease.14 Although there has traditionally been a division between rare disease genetics (studied in small disease cohorts and individual high-risk families) and common disease genetics (studied in large disease cohorts and population biobanks), in reality a continuum of causality is likely for many human disorders.15 Fortunately, rare and common disease studies suffer from opposing ascertainment biases. Clinical and family-based cohorts ascertained as a result of a specific clinical presentation will tend to overestimate the penetrance of any identified disease-causing variants.16 In contrast, population cohorts tend to be enriched for healthy individuals (the so-called “healthy volunteer” selection bias) who have both the time and ability to volunteer for a study,17, 18 and they will therefore tend to underestimate penetrance. Population cohorts that have high-resolution genetic and clinical data are therefore invaluable for establishing minimum penetrance estimates, exploring variable expressivity, and challenging pathogenicity assertions made in the clinical arena.

Previous studies have been unable to analyze rare variants in sufficiently large population-based studies to establish pathogenicity and lower bounds for penetrance. Large population cohorts such as UKB provide an opportunity to investigate the relationship between genes and disease. However, the absence of genome-wide sequencing data has thus far minimized the impact of UKB in the rare disease community. We have established a method, using combined intensity plots for individual variants across all genotyping batches, for evaluating the analytical validity of rare variants genotyped by microarray. Although we initially tried to examine variant cluster plots for each batch separately, as recommended by UKB, this proved impossible because of the rarity of most clinically important variants. MAF was an extremely good predictor of the likelihood that a variant would be genotyped well by the UKB arrays (Figure 1). At MAF > 0.005% (∼50 heterozygous individuals out of 500,000 in UKB) the FPR was ∼7%, and most variants were well genotyped, but the FPR was ∼60% at MAF > 0.001% (∼10 heterozygous individuals), and we classified all variants at MAF < 0.0005% (∼5 heterozygous individuals) as being low quality. This has important implications for epidemiological research carried out uncritically with these data. Although many rare variants in UKB are well-genotyped with the arrays, the rarer the variant, the more likely it is to be poor quality and therefore yield false associations. The authors declare no competing interests.   Source:


0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments