• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于堆叠策略的混合框架用于识别非编码 RNA。

The stacking strategy-based hybrid framework for identifying non-coding RNAs.

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab023.

DOI:10.1093/bib/bbab023
PMID:33693454
Abstract

With the development of next-generation sequencing technology, a large number of transcripts need to be analyzed, and it has been a challenge to distinguish non-coding ribonucleic acid (RNAs) (ncRNAs) from coding RNAs. And for non-model organisms, due to the lack of transcriptional data, many existing methods cannot identify them. Therefore, in addition to using deoxyribonucleic acid-based and RNA-based features, we also proposed a hybrid framework based on the stacking strategy to identify ncRNAs, and we innovatively added eight features based on predicted peptides. The proposed framework was based on stacking two-layer classifier which combined random forest (RF), LightGBM, XGBoost and logistic regression (LR) models. We used this framework to build two types of models. For cross-species ncRNAs identification model, we tested it on six different species: human, mouse, zebrafish, fruit fly, worm and Arabidopsis. Compared with other tools, our model was the best in datasets of Arabidopsis, worm and zebrafish with the accuracy of 98.36%, 99.65% and 94.12%. For performance metrics analysis, the datasets of the six species were considered as a whole set, and the sensitivity, accuracy, precision and F1 values of our model were the best. For the plant-specific ncRNAs identification model, the average values of the six metrics of the two experiments were all greater than 95%, which demonstrated it can be used to identify ncRNAs in plants. The above indicates that the hybrid framework we designed is universal between animals and plants and has significant advantages in the identification of cross-species ncRNAs.

摘要

随着下一代测序技术的发展,需要分析大量的转录本,区分非编码核糖核酸(ncRNAs)和编码 RNA 一直是一个挑战。对于非模式生物,由于缺乏转录数据,许多现有方法无法识别它们。因此,除了使用基于脱氧核糖核酸和基于 RNA 的特征外,我们还提出了一种基于堆叠策略的混合框架来识别 ncRNAs,并创新性地添加了基于预测肽的八个特征。所提出的框架基于堆叠两层分类器,结合了随机森林(RF)、LightGBM、XGBoost 和逻辑回归(LR)模型。我们使用该框架构建了两种类型的模型。对于跨物种 ncRNAs 识别模型,我们在六个不同物种上进行了测试:人类、小鼠、斑马鱼、果蝇、线虫和拟南芥。与其他工具相比,我们的模型在拟南芥、线虫和斑马鱼的数据集上表现最好,准确率分别为 98.36%、99.65%和 94.12%。对于性能指标分析,将六个物种的数据集视为一个整体,我们的模型的敏感性、准确性、精度和 F1 值是最好的。对于植物特异性 ncRNAs 识别模型,两个实验的六个指标的平均值均大于 95%,这表明它可以用于识别植物中的 ncRNAs。上述结果表明,我们设计的混合框架在动物和植物之间具有通用性,并且在跨物种 ncRNAs 的识别方面具有显著优势。

相似文献

1
The stacking strategy-based hybrid framework for identifying non-coding RNAs.基于堆叠策略的混合框架用于识别非编码 RNA。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab023.
2
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.
3
CPPred: coding potential prediction based on the global description of RNA sequence.CPPred:基于 RNA 序列全局描述的编码潜能预测。
Nucleic Acids Res. 2019 May 7;47(8):e43. doi: 10.1093/nar/gkz087.
4
Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs.拟南芥非编码RNA特征性表达序列标签的鉴定与分析。
Plant Physiol. 2001 Nov;127(3):765-76.
5
Computational prediction of novel non-coding RNAs in Arabidopsis thaliana.拟南芥中新型非编码RNA的计算预测
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S36. doi: 10.1186/1471-2105-10-S1-S36.
6
Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.计算方法在非编码 RNA 分类和亚细胞定位预测中的研究进展。
Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719.
7
Promoter-based identification of novel non-coding RNAs reveals the presence of dicistronic snoRNA-miRNA genes in Arabidopsis thaliana.基于启动子的新型非编码RNA鉴定揭示了拟南芥中双顺反子snoRNA-miRNA基因的存在。
BMC Genomics. 2015 Nov 25;16:1009. doi: 10.1186/s12864-015-2221-x.
8
PINC: A Tool for Non-Coding RNA Identification in Plants Based on an Automated Machine Learning Framework.PINC:基于自动化机器学习框架的植物非编码 RNA 鉴定工具。
Int J Mol Sci. 2022 Oct 5;23(19):11825. doi: 10.3390/ijms231911825.
9
Genomic features and regulatory roles of intermediate-sized non-coding RNAs in Arabidopsis.拟南芥中大小非编码 RNA 的基因组特征和调控作用。
Mol Plant. 2014 Mar;7(3):514-27. doi: 10.1093/mp/sst177. Epub 2014 Jan 7.
10
Identification of multiple RNAs using feature fusion.利用特征融合识别多种 RNA。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab178.

引用本文的文献

1
Prognostic value analysis of cholesterol and cholesterol homeostasis related genes in breast cancer by Mendelian randomization and multi-omics machine learning.基于孟德尔随机化和多组学机器学习的乳腺癌中胆固醇及胆固醇稳态相关基因的预后价值分析
Front Oncol. 2023 Nov 7;13:1246880. doi: 10.3389/fonc.2023.1246880. eCollection 2023.
2
A First Computational Frame for Recognizing Heparin-Binding Protein.一种用于识别肝素结合蛋白的首个计算框架。
Diagnostics (Basel). 2023 Jul 24;13(14):2465. doi: 10.3390/diagnostics13142465.
3
Decoding the regulatory roles of non-coding RNAs in cellular metabolism and disease.
解析非编码 RNA 在细胞代谢和疾病中的调控作用。
Mol Ther. 2023 Jun 7;31(6):1562-1576. doi: 10.1016/j.ymthe.2023.04.012. Epub 2023 Apr 27.
4
Artificial intelligence reveals dysregulation of osteosarcoma and cuproptosis-related biomarkers, PDHA1, CDKN2A and neutrophils.人工智能揭示骨肉瘤和铜死亡相关生物标志物 PDHA1、CDKN2A 和中性粒细胞的失调。
Sci Rep. 2023 Mar 26;13(1):4927. doi: 10.1038/s41598-023-32195-2.
5
Bitter-RF: A random forest machine model for recognizing bitter peptides.苦味-RF:一种用于识别苦味肽的随机森林机器学习模型。
Front Med (Lausanne). 2023 Jan 26;10:1052923. doi: 10.3389/fmed.2023.1052923. eCollection 2023.
6
StackCirRNAPred: computational classification of long circRNA from other lncRNA based on stacking strategy.StackCirRNAPred:基于堆叠策略的长 circRNA 与其他 lncRNA 的计算分类。
BMC Bioinformatics. 2022 Dec 27;23(1):563. doi: 10.1186/s12859-022-05118-7.
7
HLGNN-MDA: Heuristic Learning Based on Graph Neural Networks for miRNA-Disease Association Prediction.HLGNN-MDA:基于图神经网络的启发式学习在 miRNA-疾病关联预测中的应用。
Int J Mol Sci. 2022 Oct 29;23(21):13155. doi: 10.3390/ijms232113155.
8
iSnoDi-LSGT: identifying snoRNA-disease associations based on local similarity constraints and global topological constraints.iSnoDi-LSGT:基于局部相似性约束和全局拓扑约束识别 snoRNA-疾病关联。
RNA. 2022 Dec;28(12):1558-1567. doi: 10.1261/rna.079325.122. Epub 2022 Oct 3.
9
Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework.基于成对能量含量和堆积框架的植物抗性蛋白预测
Front Plant Sci. 2022 May 31;13:912599. doi: 10.3389/fpls.2022.912599. eCollection 2022.
10
Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM.基于二核苷酸自交协方差和基于注意力的双向 LSTM 识别和分类增强子
Comput Math Methods Med. 2022 Apr 5;2022:7518779. doi: 10.1155/2022/7518779. eCollection 2022.