Date Published: May 23, 2012
Publisher: Hindawi Publishing Corporation
Author(s): Julien Allali, Cédric Saule, Cédric Chauve, Yves d’Aubenton-Carafa, Alain Denise, Christine Drevet, Pascal Ferraro, Daniel Gautheret, Claire Herrbach, Fabrice Leclerc, Antoine de Monte, Aida Ouangraoua, Marie-France Sagot, Michel Termier, Claude Thermes, Hélène Touzet.
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
Motivated by the fundamental role of RNAs, and especially of small noncoding RNAs, several methods for high-throughput generation of noncoding RNA candidates have been developed recently [1–3]. A fundamental problem is then to infer functional annotation for such putative RNA genes [4, 5] which often involves RNA structure comparisons. Most approaches to compare RNA structures focus on the secondary structure, an intermediate level between the sequence and the full three-dimensional structure, which is both tractable from a computational point of view and relevant from a functional genomics point of view. The problem we consider here is the following: given a new RNA secondary structure (the query) and a database of known and annotated RNA secondary structures which of these known structures display most structural features similar to the query? Databases such as RFAM  or RNA STRAND  come naturally to mind, but in-house collections of RNA structures resulting from high-throughput experiments can also be considered.
A BRASERO benchmark, either provided on BRASERO or designed by a user, aims at assessing the ability of several pairwise RNA secondary structures comparison software tools to properly classify the sequences into positive and negative sets with respect to a given reference set. This assessment is motivated by the practical problem of identifying similar structures (structural homologs) into a large RNA database (see Figure 1).
We illustrate here a typical use of the BRASERO website, by comparing several programs based on computing an edit distance or an alignment between pairs of RNA secondary structures, applied on a benchmark for the RNA family of Signal Recognition Particle (SRP). We compare six tools: RNAdistance , RNAforester , MiGaL , TreeMatching , Gardenia , NestedAlign , and RNAStrAT . These tools rely on different models of secondary structures, such as ordered trees, multilayers models, arc-annotated sequences, but are all based on the edit distance and alignment approach pioneered in [19, 21–23]. As these tools also rely on a different usage of the primary sequence conservation, we also included BLAST  for comparison. For each software, the default parameters were used.
BRASERO provides useful tools and benchmarks for comparing RNA secondary structures software tools. Application can be in helping researchers decide on which tool to use either for comparing new RNA secondary structures with a specific family, or in assessing good parameters for pairwise comparison software tools in mining large sets of RNA secondary structures.