Suppr超能文献

预测经低通量实验验证的功能性长非编码 RNA。

Predicting functional long non-coding RNAs validated by low throughput experiments.

机构信息

Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University , Dezhou , China.

College of Physics and Electronic Information, Dezhou University , Dezhou , China.

出版信息

RNA Biol. 2019 Nov;16(11):1555-1564. doi: 10.1080/15476286.2019.1644590. Epub 2019 Jul 26.

Abstract

High-throughput techniques have uncovered hundreds and thousands of long non-coding RNAs (lncRNAs). Among them, only a tiny fraction has experimentally validated functions (EVlncRNAs) by low-throughput methods. What fraction of lncRNAs from high-throughput experiments (HTlncRNAs) is truly functional is an active subject of debate. Here, we developed the first method to distinguish EVlncRNAs from HTlncRNAs and mRNAs by using Support Vector Machines and found that EVlncRNAs can be well separated from HTlncRNAs and mRNAs with 0.6 for Matthews correlation coefficient, 64% for sensitivity, and 81% for precision for the independent human test set. The most useful features for classification are related to sequence conservations at RNA (for separating from HTlncRNAs) and protein (for separating from mRNA) levels. The method is found to be robust as the human-RNA-trained model is applicable to independent mouse RNAs with similar accuracy and to a lesser extent to plant RNAs. The method can recover newly discovered EVlncRNAs with high sensitivity. Its application to randomly selected 2000 human HTlncRNAs indicates that the majority of HTlncRNAs is probably non-functional but a large portion (nearly 30%) are likely functional. In other words, there is an ample number of lncRNAs whose specific biological roles are yet to be discovered. The method developed here is expected to speed up and reduce the cost of the discovery by prioritizing potentially functional lncRNAs prior to experimental validation. EVlncRNA-pred is available as a web server at http://biophy.dzu.edu.cn/lncrnapred/index.html . All datasets used in this study can be obtained from the same website.

摘要

高通量技术已经揭示了数百种乃至数千种长非编码 RNA(lncRNA)。其中,只有一小部分通过低通量方法验证了功能(EVlncRNA)。高通量实验(HTlncRNA)中真正有功能的 lncRNA 比例是一个活跃的争论话题。在这里,我们开发了第一个通过支持向量机(Support Vector Machine,SVM)区分 EVlncRNA 和 HTlncRNA 与 mRNA 的方法,发现 EVlncRNA 可以与 HTlncRNA 和 mRNA 很好地区分开来,在独立的人类测试集上,马修斯相关系数(Matthews correlation coefficient,MCC)为 0.6,灵敏度为 64%,精度为 81%。分类最有用的特征与 RNA(用于与 HTlncRNA 区分)和蛋白质(用于与 mRNA 区分)水平的序列保守性有关。该方法具有很强的稳健性,因为在人类 RNA 上训练的模型可以应用于独立的老鼠 RNA,并且具有相似的准确性,在较小程度上也可以应用于植物 RNA。该方法可以以高灵敏度恢复新发现的 EVlncRNA。将其应用于随机选择的 2000 个人类 HTlncRNA 表明,大多数 HTlncRNA 可能是无功能的,但很大一部分(近 30%)可能是有功能的。换句话说,有相当数量的 lncRNA 的特定生物学作用尚未被发现。该方法有望通过在实验验证之前优先考虑潜在功能的 lncRNA 来加速和降低发现的成本。EVlncRNA-pred 可作为一个网络服务器,网址为:http://biophy.dzu.edu.cn/lncrnapred/index.html。本研究中使用的所有数据集都可以从同一网站获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8103/6779387/728f7812ba7a/krnb-16-11-1644590-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验