Date Published: March 27, 2019
Publisher: Public Library of Science
Author(s): Matilda Rentoft, Daniel Svensson, Andreas Sjödin, Pall I. Olason, Olle Sjöström, Carin Nylander, Pia Osterman, Rickard Sjögren, Sergiu Netotea, Carl Wibom, Kristina Cederquist, Andrei Chabes, Johan Trygg, Beatrice S. Melin, Erik Johansson, Mathias Toft.
Whole-genome sequencing is a promising approach for human autosomal dominant disease studies. However, the vast number of genetic variants observed by this method constitutes a challenge when trying to identify the causal variants. This is often handled by restricting disease studies to the most damaging variants, e.g. those found in coding regions, and overlooking the remaining genetic variation. Such a biased approach explains in part why the genetic causes of many families with dominantly inherited diseases, in spite of being included in whole-genome sequencing studies, are left unsolved today. Here we explore the use of a geographically matched control population to minimize the number of candidate disease-causing variants without excluding variants based on assumptions on genomic position or functional predictions. To exemplify the benefit of the geographically matched control population we apply a typical disease variant filtering strategy in a family with an autosomal dominant form of colorectal cancer. With the use of the geographically matched control population we end up with 26 candidate variants genome wide. This is in contrast to the tens of thousands of candidates left when only making use of available public variant datasets. The effect of the local control population is dual, it (1) reduces the total number of candidate variants shared between affected individuals, and more importantly (2) increases the rate by which the number of candidate variants are reduced as additional affected family members are included in the filtering strategy. We demonstrate that the application of a geographically matched control population effectively limits the number of candidate disease-causing variants and may provide the means by which variants suitable for functional studies are identified genome wide.
With the introduction of next-generation sequencing technologies, expectations were high that disease-causing genetic variants in familial diseases could be identified. However, the discoveries in recent years have in many cases been limited to the “low hanging fruit”, resolving familial diseases with a strong phenotype and an early age of onset and predominantly finding variants in coding regions and in known disease genes [1, 2]. A major reason for this is the high diversity of the human genome that leads to a large number of candidate disease-causing variants within any given family . Prioritizing candidate variants for functional validation is difficult, often becomes biased towards already-known disease pathways, and has lately been shown to be of limited use in clinical settings [4, 5]. Large-scale functional studies are not possible to perform due to high cost and time considerations, thus limiting functional investigations to the strongest candidates.
In this study we have shown that it is possible to limit the final number of candidate disease-causing variants to a level where functional assays can be applied to identify the causal variant of a monogenic disease, even when using a whole-genome sequencing approach and without the use of common practice functional prediction filters that may produce false negatives. In other words, it is possible to successfully employ an unbiased filtering strategy where variants found across the entire genome, regardless of predicted functional importance, are included in the analysis. There is, however, an absolute requirement to remove variants that are common and therefore not disease-causing, in the geographic area from where the studied family originates.