Research Article: Structural prediction of RNA switches using conditional base-pair probabilities

Date Published: June 12, 2019

Publisher: Public Library of Science

Author(s): Amirhossein Manzourolajdad, John L. Spouge, Yaakov Koby Levy.

http://doi.org/10.1371/journal.pone.0217625

Abstract

An RNA switch triggers biological functions by toggling between two conformations. RNA switches include bacterial riboswitches, where ligand binding can stabilize a bound structure. For RNAs with only one stable structure, structural prediction usually just requires a straightforward free energy minimization, but for an RNA switch, the prediction of a less stable alternative structure is often computationally costly and even problematic. The current sampling-clustering method predicts stable and alternative structures by partitioning structures sampled from the energy landscape into two clusters, but it is very time-consuming. Instead, we predict the alternative structure of an RNA switch from conditional probability calculations within the energy landscape. First, our method excludes base pairs related to the most stable structure in the energy landscape. Then, it detects stable stems (“seeds”) in the remaining landscape. Finally, it folds an alternative structure prediction around a seed. While having comparable riboswitch classification performance, the conditional-probability computations had fewer adjustable parameters, offered greater predictive flexibility, and were more than one thousand times faster than the sampling step alone in sampling-clustering predictions, the competing standard. Overall, the described approach helps traverse thermodynamically improbable energy landscapes to find biologically significant substructures and structures rapidly and effectively.

Partial Text

In many organisms, structural rearrangements of RNA switches trigger biological functions. In bacteria, RNA switches regulate gene expression of mRNA downstream from them [1, 2]. In eukaryotes, they regulate alternative splicing [3]. In viruses, they can be critical in various stages of the viral life cycle, regulating rates of replication, transcription, RNA dimerization, etc. [4–9]. Plasticity, the ability to assume more than one structure, can also enhance the adaptability of RNA [10] by permitting it to accommodate distinct conformational phenotypes with only small perturbations to its genotype.

Sampling-Clustering (SC) procedure for predicting the alternative structure. Given an RNA sequence, the energy landscape of the RNA is sampled at different temperatures starting from 300 structures at 37°C and 150 structures at each temperature value at six decile intervals towards the melting temperature of the RNA strand, totaling to 1200 samples per RNA. Sample numbers were selected according to same SC procedure used in [54] for comparison purposes. The samples are then partitioned into two clusters using k-means clustering, using base-pair Hamming distance dH (the distance dH{S’,S”} counts the base pairs that are in either of the structures S’ and S” but not in both). Denote the most energetically stable secondary structure configuration of a riboswitch by S1; the alternative secondary structure, by S2. As in [18], the computed MFE structure (denoted by S1*) then was used to predict the most stable structure S1. The lowest-energy structure of the cluster not containing the MFE structure (denoted by S^2*, the over-hat often denoting a sampled quantity in statistics) was used to predict the alternative structure S2.

The Results section focuses on predicting the alternative structure. Here, the most energetically stable structure prediction is the MFE structure S1*, so its precise quality depends on the thermodynamic model. Furthermore, both the CP and SC methods use the MFE structure to predict the most stable structure. Therefore, the Results section omits comment on the stable structure prediction.

Computational identification of RNA switches and substructures solely from sequence could elucidate biological control mechanisms in many species, and successful identification of switches could accelerate the discovery of new sensors and control mechanisms, particularly in bacteria and other prokaryotes. For an RNA switch (or more specifically, for a typical riboswitch), predicting an alternative biologically functional structure can be challenging. Such predictions hold the key to understanding many regulatory mechanisms, however. Although riboswitches are a very diverse subset of RNA switches, there is not enough experimentally verified data for optimizing a set of parameters of a universal RNA Switch Predictor.

In this article, we use exact conditional base-pair probabilities to predict the alternative structure of RNA switches. Our approach selects base pairs associated with high conditional probabilities, after it excludes substructures in the primary metastable structure (here, the MFE structure). Conditioning on exclusion improves the chance that an exact (McCaskill) probability calculation finds base pairs in an alternative structure. Within the limitations imposed on our ROC tests by available data, Conditional-Probability (CP) computations had classification accuracy comparable to Sampling-Clustering (SC) computations. In contrast, however, the results for computational speed were not at all tentative: CP was more than 1000 times faster than SC, its speed making it a much more promising as a predictor of alternative structures in computationally demanding settings like genomic RNA.

Our CondAlt source code for predicting alternative structures is publicly available for download at https://go.usa.gov/xRu79. The data used in the first dataset (Barsacchi) is available in S2 File and also publicly available in [54]. The data used in the second dataset (Purine aptamers) were taken from Rfam. Please refer to Material and Methods for further details.

 

Source:

http://doi.org/10.1371/journal.pone.0217625