利用结构信息对非编码RNA进行预测和分类。

Prediction and classification of ncRNAs using structural information.

作者信息

Panwar Bharat, Arora Amit, Raghava Gajendra P S

机构信息

Bioinformatics Centre, Institute of Microbial Technology (CSIR), Sector 39A, Chandigarh, India.

出版信息

BMC Genomics. 2014 Feb 13;15:127. doi: 10.1186/1471-2164-15-127.

DOI:10.1186/1471-2164-15-127

PMID:24521294

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3925371/

Abstract

BACKGROUND

Evidence is accumulating that non-coding transcripts, previously thought to be functionally inert, play important roles in various cellular activities. High throughput techniques like next generation sequencing have resulted in the generation of vast amounts of sequence data. It is therefore desirable, not only to discriminate coding and non-coding transcripts, but also to assign the noncoding RNA (ncRNA) transcripts into respective classes (families). Although there are several algorithms available for this task, their classification performance remains a major concern. Acknowledging the crucial role that non-coding transcripts play in cellular processes, it is required to develop algorithms that are able to precisely classify ncRNA transcripts.

RESULTS

In this study, we initially develop prediction tools to discriminate coding or non-coding transcripts and thereafter classify ncRNAs into respective classes. In comparison to the existing methods that employed multiple features, our SVM-based method by using a single feature (tri-nucleotide composition), achieved MCC of 0.98. Knowing that the structure of a ncRNA transcript could provide insights into its biological function, we use graph properties of predicted ncRNA structures to classify the transcripts into 18 different non-coding RNA classes. We developed classification models using a variety of algorithms (BayeNet, NaiveBayes, MultilayerPerceptron, IBk, libSVM, SMO and RandomForest) and observed that model based on RandomForest performed better than other models. As compared to the GraPPLE study, the sensitivity (of 13 classes) and specificity (of 14 classes) was higher. Moreover, the overall sensitivity of 0.43 outperforms the sensitivity of GraPPLE (0.33) whereas the overall MCC measure of 0.40 (in contrast to MCC of 0.29 of GraPPLE) was significantly higher for our method. This clearly demonstrates that our models are more accurate than existing models.

CONCLUSIONS

This work conclusively demonstrates that a simple feature, tri-nucleotide composition, is sufficient to discriminate between coding and non-coding RNA sequences. Similarly, graph properties based feature set along with RandomForest algorithm are most suitable to classify different ncRNA classes. We have also developed an online and standalone tool-- RNAcon ( http://crdd.osdd.net/raghava/rnacon).

摘要

背景

越来越多的证据表明，以前被认为功能惰性的非编码转录本在各种细胞活动中发挥着重要作用。像下一代测序这样的高通量技术已经产生了大量的序列数据。因此，不仅需要区分编码和非编码转录本，还需要将非编码RNA（ncRNA）转录本归类到各自的类别（家族）中。虽然有几种算法可用于此任务，但其分类性能仍然是一个主要问题。认识到非编码转录本在细胞过程中所起的关键作用，需要开发能够精确分类ncRNA转录本的算法。

结果

在本研究中，我们首先开发了预测工具来区分编码或非编码转录本，然后将ncRNA分类到各自的类别中。与采用多种特征的现有方法相比，我们基于支持向量机（SVM）的方法通过使用单个特征（三核苷酸组成），马修斯相关系数（MCC）达到了0.98。由于知道ncRNA转录本的结构可以为其生物学功能提供见解，我们使用预测的ncRNA结构的图形属性将转录本分类为18种不同的非编码RNA类别。我们使用多种算法（贝叶斯网络、朴素贝叶斯、多层感知器、IBk、libSVM、SMO和随机森林）开发了分类模型，并观察到基于随机森林的模型比其他模型表现更好。与GraPPLE研究相比，（13个类别的）灵敏度和（14个类别的）特异性更高。此外，0.43的总体灵敏度优于GraPPLE的灵敏度（0.33），而我们方法的总体MCC值为0.40（相比之下，GraPPLE的MCC为0.29）则显著更高。这清楚地表明我们的模型比现有模型更准确。

结论

这项工作最终证明了一个简单的特征，即三核苷酸组成，足以区分编码和非编码RNA序列。同样，基于图形属性的特征集与随机森林算法最适合对不同的ncRNA类别进行分类。我们还开发了一个在线和独立的工具——RNAcon（http://crdd.osdd.net/raghava/rnacon）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc79/3925371/b740125f02b2/1471-2164-15-127-1.jpg

相似文献

Prediction and classification of ncRNAs using structural information.利用结构信息对非编码RNA进行预测和分类。

BMC Genomics. 2014 Feb 13;15:127. doi: 10.1186/1471-2164-15-127.

Classification of ncRNAs using position and size information in deep sequencing data.利用深度测序数据中的位置和大小信息对 ncRNAs 进行分类。

Bioinformatics. 2010 Sep 15;26(18):i426-32. doi: 10.1093/bioinformatics/btq363.

Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.计算方法在非编码 RNA 分类和亚细胞定位预测中的研究进展。

Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719.

nRC: non-coding RNA Classifier based on structural features.nRC：基于结构特征的非编码RNA分类器。

BioData Min. 2017 Aug 1;10:27. doi: 10.1186/s13040-017-0148-2. eCollection 2017.

Identification and classification of ncRNA molecules using graph properties.利用图属性对非编码RNA分子进行识别和分类。

Nucleic Acids Res. 2009 May;37(9):e66. doi: 10.1093/nar/gkp206. Epub 2009 Apr 1.

Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm.在混合随机森林集成算法中利用新的复合特征识别非编码RNA

Nucleic Acids Res. 2014 Jun;42(11):e93. doi: 10.1093/nar/gku325. Epub 2014 Apr 25.

Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations.基于多视图结构表示的非编码 RNA 序列比对分类的深度森林集成学习。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa354.

Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change.基于预测的二级结构形成自由能变化检测非编码RNA。

BMC Bioinformatics. 2006 Mar 27;7:173. doi: 10.1186/1471-2105-7-173.

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding.MncR：一种基于序列和结构编码的用于 ncRNA 分类的晚期集成机器学习模型。

Int J Mol Sci. 2023 May 17;24(10):8884. doi: 10.3390/ijms24108884.

nocoRNAc: characterization of non-coding RNAs in prokaryotes.nocoRNAc：原核生物中非编码 RNA 的特征。

BMC Bioinformatics. 2011 Jan 31;12:40. doi: 10.1186/1471-2105-12-40.

引用本文的文献

DRFormer: A Benchmark Model for RNA Sequence Downstream Tasks.DRFormer：一种用于RNA序列下游任务的基准模型。

Genes (Basel). 2025 Feb 26;16(3):284. doi: 10.3390/genes16030284.

MMnc: multi-modal interpretable representation for non-coding RNA classification and class annotation.MMnc：用于非编码RNA分类和类别注释的多模态可解释表示

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf051.

Comparison and benchmark of deep learning methods for non-coding RNA classification.深度学习方法在非编码 RNA 分类中的比较和基准测试。

PLoS Comput Biol. 2024 Sep 12;20(9):e1012446. doi: 10.1371/journal.pcbi.1012446. eCollection 2024 Sep.

MFPINC: prediction of plant ncRNAs based on multi-source feature fusion.MFPINC：基于多源特征融合的植物 ncRNAs 预测。

BMC Genomics. 2024 May 30;25(1):531. doi: 10.1186/s12864-024-10439-3.

Engineered smart materials for RNA based molecular therapy to treat Glioblastoma.用于基于RNA的分子疗法治疗胶质母细胞瘤的工程智能材料。

Bioact Mater. 2023 Nov 27;33:396-423. doi: 10.1016/j.bioactmat.2023.11.007. eCollection 2024 Mar.

ConF: A Deep Learning Model Based on BiLSTM, CNN, and Cross Multi-Head Attention Mechanism for Noncoding RNA Family Prediction.ConF：一种基于 BiLSTM、CNN 和交叉多头注意力机制的深度学习模型，用于非编码 RNA 家族预测。

Biomolecules. 2023 Nov 13;13(11):1643. doi: 10.3390/biom13111643.

Int J Mol Sci. 2023 May 17;24(10):8884. doi: 10.3390/ijms24108884.

NCodR: A multi-class support vector machine classification to distinguish non-coding RNAs in Viridiplantae.NCodR：一种用于区分绿色植物中非编码RNA的多类支持向量机分类方法。

Quant Plant Biol. 2022 Oct 7;3:e23. doi: 10.1017/qpb.2022.18. eCollection 2022.

ncDENSE: a novel computational method based on a deep learning framework for non-coding RNAs family prediction.ncDENSE：一种基于深度学习框架的新型计算方法，用于预测非编码 RNA 家族。

BMC Bioinformatics. 2023 Feb 27;24(1):68. doi: 10.1186/s12859-023-05191-6.

LncCat: An ORF attention model to identify LncRNA based on ensemble learning strategy and fused sequence information.LncCat：一种基于集成学习策略和融合序列信息来识别长链非编码RNA的开放阅读框注意力模型。

Comput Struct Biotechnol J. 2023 Feb 8;21:1433-1447. doi: 10.1016/j.csbj.2023.02.012. eCollection 2023.

本文引用的文献

WebAUGUSTUS--a web service for training AUGUSTUS and predicting genes in eukaryotes.WebAUGUSTUS--一个用于训练 AUGUSTUS 和预测真核生物基因的网络服务。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W123-8. doi: 10.1093/nar/gkt418. Epub 2013 May 21.

Molecular mechanisms of RNA interference.RNA 干扰的分子机制。

Annu Rev Biophys. 2013;42:217-39. doi: 10.1146/annurev-biophys-083012-130404.

An integrated encyclopedia of DNA elements in the human genome.人类基因组中 DNA 元件的综合百科全书。

Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

Non-coding RNAs in human disease.人类疾病中的非编码 RNA。

Nat Rev Genet. 2011 Nov 18;12(12):861-74. doi: 10.1038/nrg3074.

IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming.IPknot：使用整数规划快速准确地预测具有假结的 RNA 二级结构。

Bioinformatics. 2011 Jul 1;27(13):i85-93. doi: 10.1093/bioinformatics/btr215.

Predicting sub-cellular localization of tRNA synthetases from their primary structures.从一级结构预测 tRNA 合成酶的亚细胞定位。

Amino Acids. 2012 May;42(5):1703-13. doi: 10.1007/s00726-011-0872-8. Epub 2011 Mar 13.

Telomerase structure function.端粒酶结构与功能。

Curr Opin Struct Biol. 2011 Feb;21(1):92-100. doi: 10.1016/j.sbi.2010.11.005. Epub 2010 Dec 17.

Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains.基于 PROSITE 结构域预测和分类氨酰-tRNA 合成酶。

BMC Genomics. 2010 Sep 22;11:507. doi: 10.1186/1471-2164-11-507.

Causes and consequences of microRNA dysregulation in cancer.癌症中微小RNA失调的原因及后果。

Nat Rev Genet. 2009 Oct;10(10):704-14. doi: 10.1038/nrg2634.

Identification and classification of ncRNA molecules using graph properties.利用图属性对非编码RNA分子进行识别和分类。

Nucleic Acids Res. 2009 May;37(9):e66. doi: 10.1093/nar/gkp206. Epub 2009 Apr 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用结构信息对非编码RNA进行预测和分类。

Prediction and classification of ncRNAs using structural information.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献