Research Article: NovelSNPer: A Fast Tool for the Identification and Characterization of Novel SNPs and InDels

Date Published: October 31, 2011

Publisher: Hindawi Publishing Corporation

Author(s): Jens Aßmus, Armin O. Schmitt, Ralf H. Bortfeldt, Gudrun A. Brockmann.

http://doi.org/10.1155/2011/657341

Abstract

Typically, next-generation resequencing projects produce large lists of variants. NovelSNPer is a software
tool that permits fast and efficient processing of such output lists. In a first step, NovelSNPer determines if a variant represents a known variant or a previously unknown variant. In a second step, each variant is classified into one of 15 SNP classes or 19 InDel classes. Beside the classes used by Ensembl, we introduce POTENTIAL_START_GAINED and START_LOST as new functional classes and present a classification scheme for InDels. NovelSNPer is based upon the gene structure information stored in Ensembl. It processes two million SNPs in six hours. The tool can be used online or downloaded.

Partial Text

One of the main goals of resequencing projects is to detect genomic variation between an individual of interest and the reference genome that is stored in public databases. Typically, the whole genome or a section of the genome sequence encompassing the genomic region of interest is chosen as reference sequence for the resequencing project. The DNA sample is manually enriched for nonrepetitive sequence in the target genomic region using capture oligomers before performing the sequencing reactions. Depending on the technology, reads of length in the range between a couple of dozens and a few hundred nucleotides result. One sequencing reaction typically yields in the order of dozens of millions of reads.

Whole genome resequencing experiments are performed to systematically identify genomic variations. For example, the complete genome of a single bos taurus animal was sequenced to identify millions of previously unknown cattle SNPs [28]. In another work, artificial mutations that are responsible for phenotypes in caenorhabditis elegans could be identified thanks to whole genome sequencing [38]. Distilling the huge quantity of information into meaningful lists of SNPs is a multi-step bioinformatics process. NovelSNPer is an easy to use tool that helps scientists with the analysis of next-generation sequencing data. Lengthy lists of SNPs from next-generation resequencing projects are efficiently assessed and annotated with the most important SNP features. Of outmost interest is the functional class of an SNP. SNPs involving stop gains (nonsense mutations) should in most cases mediate severe impairment of a protein’s functioning. Non-synonymous SNPs can also entail a modification of a protein’s conformation depending on how dissimilar the exchanged amino acids are. Whereas these two classes of SNPs can have a more or less direct effect on a protein (see, e.g., [39]), SNPs in untranslated regions (UTRs), introns, and up- and downstream regions have the potential to alter the binding behaviour of transcription factors or splice factors and, thus, to alter gene expression, indirectly.

To make full usage of the data that are generated in next-generation-sequencing experiments, the lists of called variations must be efficiently screened. NovelSNPer is a fast tool that exploits the annotations about genes and genomes from the Ensembl database to classify each called variation as novel or as previously existing. Each variation is classified into one or more of twenty-one functional variation classes. Two of these classes, START_LOST and POTENTIAL_START_GAINED, were thus far rarely included in SNP analysis programs. However, we showed that these two types of functional classes are quite frequent and could play a significant role in protein synthesis. The great number of species that can be analysed, the structured and detailed outputs, and the integration of new features like additional variation classes or the calculation of the conservation score for each variation make NovelSNPer a versatile analysis tool. Also the possibility to predict the effect of variations on a newly discovered transcript which is not in a public database is very useful. We thus anticipate a wide range of applications in the biological, medical and agricultural sciences.

NovelSNPer is implemented in Perl using Bioperl and Ensembl’s Perl APIs. The latest version of the program can be downloaded free of charge from http://www2.hu-berlin.de/wikizbnutztier/software/NovelSNPer/. The source code is also available at the website.

 

Source:

http://doi.org/10.1155/2011/657341

 

Leave a Reply

Your email address will not be published.