Fakhry Carl Tony, Kulkarni Prajna, Chen Ping, Kulkarni Rahul, Zarringhalam Kourosh
Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125, MA, USA.
Department of Physics, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, 02125, MA, USA.
BMC Genomics. 2017 Aug 22;18(1):645. doi: 10.1186/s12864-017-4057-z.
Small RNAs (sRNAs) constitute an important class of post-transcriptional regulators that control critical cellular processes in bacteria. Recent research using high-throughput transcriptomic approaches has led to a dramatic increase in the discovery of bacterial sRNAs. However, it is generally believed that the currently identified sRNAs constitute a limited subset of the bacterial sRNA repertoire. In several cases, sRNAs belonging to a specific class are already known and the challenge is to identify additional sRNAs belonging to the same class. In such cases, machine-learning approaches can be used to predict novel sRNAs in a given class.
In this work, we develop novel bioinformatics approaches that integrate sequence and structure-based features to train machine-learning models for the discovery of bacterial sRNAs. We show that features derived from recurrent structural motifs in the ensemble of low energy secondary structures can distinguish the RNA classes with high accuracy.
We apply this approach to predict new members in two broad classes of bacterial small RNAs: 1) sRNAs that bind to the RNA-binding protein RsmA/CsrA in diverse bacterial species and 2) sRNAs regulated by the master regulator of virulence, ToxT, in Vibrio cholerae.
The involvement of sRNAs in bacterial adaptation to changing environments is an increasingly recurring theme in current research in microbiology. It is likely that future research, combining experimental and computational approaches, will discover many more examples of sRNAs as components of critical regulatory pathways in bacteria. We have developed a novel approach for prediction of small RNA regulators in important bacterial pathways. This approach can be applied to specific classes of sRNAs for which several members have been identified and the challenge is to identify additional sRNAs.
小RNA(sRNA)构成了一类重要的转录后调节因子,可控制细菌中的关键细胞过程。最近使用高通量转录组学方法的研究导致细菌sRNA的发现大幅增加。然而,人们普遍认为,目前已鉴定的sRNA只是细菌sRNA库中的一个有限子集。在某些情况下,属于特定类别的sRNA已经为人所知,而挑战在于识别属于同一类别的其他sRNA。在这种情况下,可以使用机器学习方法来预测给定类别中的新型sRNA。
在这项工作中,我们开发了新的生物信息学方法,该方法整合了基于序列和结构的特征,以训练用于发现细菌sRNA的机器学习模型。我们表明,从低能量二级结构集合中的递归结构基序衍生的特征可以高精度地区分RNA类别。
我们应用这种方法来预测两大类细菌小RNA中的新成员:1)在不同细菌物种中与RNA结合蛋白RsmA/CsrA结合的sRNA,以及2)在霍乱弧菌中由毒力主调节因子ToxT调节的sRNA。
sRNA参与细菌对不断变化的环境的适应是当前微生物学研究中越来越常见的主题。未来结合实验和计算方法的研究可能会发现更多sRNA作为细菌关键调节途径组成部分的例子。我们开发了一种预测重要细菌途径中小RNA调节因子的新方法。这种方法可以应用于已经鉴定出几个成员且挑战在于识别其他sRNA的特定类别的sRNA。