Date Published: March 23, 2017
Publisher: Public Library of Science
Author(s): Shweta Bhandare, Debra S. Goldberg, Robin Dowell, Georg Stoecklin.
The RNA binding proteins (RBPs) human antigen R (HuR) and Tristetraprolin (TTP) are known to exhibit competitive binding but have opposing effects on the bound messenger RNA (mRNA). How cells discriminate between the two proteins is an interesting problem. Machine learning approaches, such as support vector machines (SVMs), may be useful in the identification of discriminative features. However, this method has yet to be applied to studies of RNA binding protein motifs.
Applying the k-spectrum kernel to a support vector machine (SVM), we first verified the published binding sites of both HuR and TTP. Additional feature engineering highlighted the U-rich binding preference of HuR and AU-rich binding preference for TTP. Domain adaptation along with multi-task learning was used to predict the common binding sites.
The distinction between HuR and TTP binding appears to be subtle content features. HuR prefers strongly U-rich sequences whereas TTP prefers AU-rich as with increasing A content, the sequences are more likely to be bound only by TTP. Our model is consistent with competitive binding of the two proteins, particularly at intermediate AU-balanced sequences. This suggests that fine changes in the A/U balance within a untranslated region (UTR) can alter the binding and subsequent stability of the message. Both feature engineering and domain adaptation emphasized the extent to which these proteins recognize similar general sequence features. This work suggests that the k-spectrum kernel method could be useful when studying RNA binding proteins and domain adaptation techniques such as feature augmentation could be employed particularly when examining RBPs with similar binding preferences.
RNA binding proteins (RBPs) are crucial regulators of numerous post-transcriptional processes [1, 2]. RBPs identify their RNA targets in a highly specific fashion, through recognition of specific primary sequence and/or secondary structure. Many RNA binding proteins recognize AU-rich sequence elements, including human antigen R (HuR) and tristetraprolin (TTP). The association between AU-rich elements and particular proteins alters the stability of the RNA. For example, binding by HuR protects messenger RNA (mRNA) from degradation whereas binding of TTP promotes degradation. Recent high throughput photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP)  data is available for these two key regulatory RNA binding proteins. This data suggests that both HuR and TTP bind to similar AU-rich elements which are typically 50–150 nucleotides long and generally located within the 3′ UTR. The extent to which sequence features discriminate between the two binding proteins remains an interesting question.
Two discriminative methods, the k-spectrum kernel method  and DREME , both discovered HuR and TTP motifs that were consistent with the published motifs; the HuR k-mers were predominantly U-rich, and AU-rich for TTP. While the success rate of these methods was comparable, the k-spectrum kernel method had higher sensitivity and PPV values than DREME. With discriminative methods, sensitivity and specificity are often trade-offs. This is likely the case here, as DREME had a slightly better success rate. In some cases, a higher sensitivity is preferred, particularly when subsequent experiments will validate the predicted sites and there is a cost to missing potential targets.
To summarize, the discriminative methods were able to identify binding motifs of both RNA binding proteins. The k-spectrum kernel method provided additional insight of nucleotides around the binding sites. Neither feature engineering nor domain adaptation identified specific protein specific k-mers, further corroborating the extent to which these proteins recognize similar sequence features. Despite this, the increased k-mer length and sensitivity of the k-spectrum approach suggest this is an attractive approach for predicting unknown binding motifs of other RBPs.