Loganantharaj Rasiah, Randall Thomas A
Bioinformatics Research Lab, The Center for Advanced Computer Studies, University of Louisiana, 301 East Lewis Street, P.O. Box 44330, Lafayette, LA, 70504, USA.
Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, Durham, NC, 27709, USA.
Methods Mol Biol. 2017;1617:133-158. doi: 10.1007/978-1-4939-7046-9_10.
MicroRNAs (miRNAs) are small (18-24 nt) endogenous RNAs found across diverse phyla involved in posttranscriptional regulation, primarily downregulation of mRNAs. Experimentally determining miRNA-mRNA interactions can be expensive and time-consuming, making the accurate computational prediction of miRNA targets a high priority. Since miRNA-mRNA base pairing in mammals is not perfectly complementary and only a fraction of the identified motifs are real binding sites, accurately predicting miRNA targets remains challenging. The limitations and bottlenecks of existing algorithms and approaches are discussed in this chapter.A new miRNA-mRNA interaction algorithm was implemented in Python (TargetFind) to capture three different modes of association and to maximize detection sensitivity to around 95% for mouse (mm9) and human (hg19) reference data. For human (hg19) data, the prediction accuracy with any one feature among evolutionarily conserved score, multiple targets in a UTR or changes in free energy varied within a close range from 63.5% to 66%. When the results of these features are combined with majority voting, the expected prediction accuracy increases to 69.5%. When all three features are used together, the average best prediction accuracy with tenfold cross validation from the classifiers naïve Bayes, support vector machine, artificial neural network, and decision tree were, respectively, 66.5%, 67.1%, 69%, and 68.4%. The results reveal the advantages and limitations of these approaches.When comparing different sets of features on their strength in predicting true hg19 targets, evolutionarily conserved score slightly outperformed all other features based on thermostability, and target multiplicity. The sophisticated supervised learning algorithms did not improve the prediction accuracy significantly compared to a simple threshold based approach on conservation score or combining the results of each feature with majority agreements. The targets from randomly generated UTRs behaved similar to that of noninteracting pairs with respect to changes in free energy. Availability of additional experimental data describing noninteracting pairs will advance our understanding of the characteristics and the factors positively and negatively influencing these interactions.
微小RNA(miRNA)是一类小的(18 - 24个核苷酸)内源性RNA,存在于多种生物门类中,参与转录后调控,主要是使mRNA下调。通过实验确定miRNA与mRNA的相互作用可能既昂贵又耗时,因此准确地通过计算预测miRNA靶标成为一项高度优先的任务。由于哺乳动物中miRNA与mRNA的碱基配对并非完全互补,且只有一小部分已识别的基序是真正的结合位点,准确预测miRNA靶标仍然具有挑战性。本章将讨论现有算法和方法的局限性与瓶颈。
一种新的miRNA - mRNA相互作用算法用Python实现(TargetFind),以捕捉三种不同的关联模式,并将小鼠(mm9)和人类(hg19)参考数据的检测灵敏度最大化至约95%。对于人类(hg19)数据,在进化保守得分、UTR中的多个靶标或自由能变化这三个特征中的任何一个特征下,预测准确率在63.5%至66%的相近范围内变化。当这些特征的结果通过多数投票相结合时,预期预测准确率提高到69.5%。当同时使用所有三个特征时,来自朴素贝叶斯、支持向量机、人工神经网络和决策树分类器的十折交叉验证的平均最佳预测准确率分别为66.5%、67.1%、69%和68.4%。结果揭示了这些方法的优点和局限性。
在比较不同特征集预测真实hg19靶标的能力时,基于热稳定性和靶标多样性,进化保守得分略优于所有其他特征。与基于保守得分的简单阈值方法或将每个特征的结果与多数一致性相结合相比,复杂的监督学习算法并未显著提高预测准确率。随机生成的UTR的靶标在自由能变化方面的表现与非相互作用对相似。描述非相互作用对的额外实验数据的可用性将增进我们对这些相互作用的特征以及正向和负向影响这些相互作用的因素的理解。