Guillén-Ramírez Hugo A, Martínez-Pérez Israel M
CICESE Research Center, Department of Computer Science, Ensenada, Mexico.
Biosystems. 2018 Dec;174:63-76. doi: 10.1016/j.biosystems.2018.09.001. Epub 2018 Sep 8.
Riboswitches are non-coding RNAs that regulate gene expression by altering the structural conformation of mRNA transcripts. Their regulation mechanism might be exploited for interesting biomedical applications such as drug targets and biosensors. A major challenge consists in accurately identifying metabolite-binding RNA switches which are structurally complex and diverse. In this regard, we investigated the classification of 16 riboswitch families using supervised learning algorithms trained solely with sequence-based features. We generated a reduced feature set and proposed a visual representation to explore its components. We induced Support Vector Machine, Random Forest, Naive Bayes, J48, and HyperPipes classifiers with our proposed feature set and tested their performance over independent data. Our best multi-class classifier achieved F-measure values of 0.996 and 0.966 in the training and test phases, respectively, outperforming those of a previous approach. When compared against BLAST, our best classifiers yielded competitive results. This work shows that the classifiers trained with our sequence-based feature set accurately discriminate riboswitches.
核糖开关是非编码RNA,通过改变mRNA转录本的结构构象来调节基因表达。它们的调控机制可用于有趣的生物医学应用,如药物靶点和生物传感器。一个主要挑战在于准确识别结构复杂多样的代谢物结合RNA开关。在这方面,我们使用仅基于序列特征训练的监督学习算法研究了16个核糖开关家族的分类。我们生成了一个精简的特征集,并提出了一种可视化表示来探索其组成部分。我们用提出的特征集诱导支持向量机、随机森林、朴素贝叶斯、J48和HyperPipes分类器,并在独立数据上测试它们的性能。我们最好的多类分类器在训练和测试阶段的F值分别为0.996和0.966,优于先前方法。与BLAST相比,我们最好的分类器产生了有竞争力的结果。这项工作表明,用我们基于序列的特征集训练的分类器能够准确区分核糖开关。