Research Article: Reducing the worst case running times of a family of RNA and CFG problems, using Valiant’s approach

Date Published: August 18, 2011

Publisher: BioMed Central

Author(s): Shay Zakov, Dekel Tsur, Michal Ziv-Ukelson.

http://doi.org/10.1186/1748-7188-6-20

Abstract

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70’s, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data.

We study Valiant’s classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant’s approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant’s technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars.

The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.

Partial Text

RNA research is one of the classical domains in bioinformatics, receiving increasing attention in recent years due to discoveries regarding RNA’s role in regulation of genes and as a catalyst in many cellular processes [1,2]. It is well-known that the function of an RNA molecule is heavily dependent on its structure [3]. However, due to the difficulty of physically determining RNA structure via wet-lab techniques, computational prediction of RNA structures serves as the basis of many approaches related to RNA functional analysis [4]. Most computational tools for RNA structural prediction focus on RNA secondary structures – a reduced structural representation of RNA molecules which describes a set of paired nucleotides, through hydrogen bonds, in an RNA sequence. RNA secondary structures can be relatively well predicted computationally in polynomial time (as opposed to three-dimensional structures). This computational feasibility, combined with the fact that RNA secondary structures still reveal important information about the functional behavior of RNA molecules, account for the high popularity of state-of-the-art tools for RNA secondary structure prediction [5].

As intervals of integers, matrices, and strings will be extensively used throughout this work, we first define some related notation.

In this section we describe a template that defines a class of problems, called the Inside Vector Multiplication Template (Inside VMT). We start by giving a simple motivating example in Section 3.1. Then, the class of Inside VMT problems is formally defined in Section 3.2, and in Section 3.3 an efficient generic algorithm for all Inside VMT problems is presented.

In this section we discuss how to solve another class of problems, denoted Outside VMT problems, by modifying the algorithm presented in the previous section. Similarly to Inside VMT problems, the goal of Outside VMT problems is to compute sets of outside properties α1, α2, …, αK corresponding to some input string (see notations in Section 2.3).

In this section we describe another extension to the VMT framework, intended for problems for which the instance is a set of strings, rather than a single string. Examples of such problems are the RNA Simultaneous Alignment and Folding problem [15,37], which is described in details in Appendix B, and the RNA-RNA Interaction problem [9]. Additional problems which exhibit a slight divergence from the presented template, such as the RNA-RNA Interaction Partition Function problem [12] and the RNA Sequence to Structured-Sequence Alignment problem [13], can be solved in similar manners.

This paper presents a simplification and a generalization of Valiant’s technique, which speeds up a family of algorithms by incorporating fast matrix multiplication procedures. We suggest generic templates that identify problems for which the approach is applicable, where these templates are based on general recursive properties of the problems, rather than on their specific algorithms. Generic algorithms are described for solving all problems sustaining these templates.

The authors declare that they have no competing interests.

SZ developed and formulated the algorithms, template definitions, and examples. MZU and DT mentored the research, contributed in writing parts of the manuscript, and were active in revising and preparing the final manuscript. All authors read and approved the final manuscript.

 

Source:

http://doi.org/10.1186/1748-7188-6-20

 

Leave a Reply

Your email address will not be published.