• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用特征融合识别多种 RNA。

Identification of multiple RNAs using feature fusion.

机构信息

National Agri-Food Biotechnology Institute, Sector 81, SAS Nagar, 140306, Punjab, India.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab178.

DOI:10.1093/bib/bbab178
PMID:33971667
Abstract

Detection of novel transcripts with deep sequencing has increased the demand for computational algorithms as their identification and validation using in vivo techniques is time-consuming, costly and unreliable. Most of these discovered transcripts belong to non-coding RNAs, a large group known for their diverse functional roles but lacks the common taxonomy. Thus, upon the identification of the absence of coding potential in them, it is crucial to recognize their prime functional category. To address this heterogeneity issue, we divide the ncRNAs into three classes and present RNA classifier (RNAC) that categorizes the RNAs into coding, housekeeping, small non-coding and long non-coding classes. RNAC utilizes the alignment-based genomic descriptors to extract statistical, local binary patterns and histogram features and fuse them to construct the classification models with extreme gradient boosting. The experiments are performed on four species, and the performance is assessed on multiclass and conventional binary classification (coding versus no-coding) problems. The proposed approach achieved >93% accuracy on both classification problems and also outperformed other well-known existing methods in coding potential prediction. This validates the usefulness of feature fusion for improved performance on both types of classification problems. Hence, RNAC is a valuable tool for the accurate identification of multiple RNAs .

摘要

利用深度测序检测新的转录本增加了对计算算法的需求,因为使用体内技术对其进行鉴定和验证既耗时、昂贵又不可靠。这些新发现的转录本大多属于非编码 RNA,这是一组具有多种功能作用但缺乏通用分类的 RNA。因此,在确定它们缺乏编码潜力后,识别其主要功能类别至关重要。为了解决这种异质性问题,我们将 ncRNAs 分为三类,并提出了 RNA 分类器(RNAC),该分类器将 RNA 分为编码、管家、小非编码和长非编码类。RNAC 利用基于对齐的基因组描述符来提取统计、局部二值模式和直方图特征,并将它们融合在一起,使用极端梯度提升构建分类模型。在四个物种上进行了实验,并在多类和传统的二进制分类(编码与非编码)问题上评估了性能。该方法在两种分类问题上均达到了>93%的准确率,并且在编码潜力预测方面也优于其他著名的现有方法。这验证了特征融合在提高两种类型的分类问题性能方面的有效性。因此,RNAC 是一种用于准确识别多种 RNA 的有价值的工具。

相似文献

1
Identification of multiple RNAs using feature fusion.利用特征融合识别多种 RNA。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab178.
2
CPPred: coding potential prediction based on the global description of RNA sequence.CPPred:基于 RNA 序列全局描述的编码潜能预测。
Nucleic Acids Res. 2019 May 7;47(8):e43. doi: 10.1093/nar/gkz087.
3
Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.计算方法在非编码 RNA 分类和亚细胞定位预测中的研究进展。
Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719.
4
The stacking strategy-based hybrid framework for identifying non-coding RNAs.基于堆叠策略的混合框架用于识别非编码 RNA。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab023.
5
DotAligner: identification and clustering of RNA structure motifs.DotAligner:RNA 结构基序的识别和聚类。
Genome Biol. 2017 Dec 28;18(1):244. doi: 10.1186/s13059-017-1371-3.
6
Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.比较 mRNA 和非编码 RNA 中包含的生物学信息,以对肺癌患者进行分类。
BMC Cancer. 2019 Dec 3;19(1):1176. doi: 10.1186/s12885-019-6338-1.
7
Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm.基于 GA-SVM 算法的人类长链非编码 RNA 计算识别。
Gene. 2014 Jan 1;533(1):94-9. doi: 10.1016/j.gene.2013.09.118. Epub 2013 Oct 9.
8
PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme.PLEK:一种基于改进的k-mer方案预测长链非编码RNA和信使RNA的工具。
BMC Bioinformatics. 2014 Sep 19;15(1):311. doi: 10.1186/1471-2105-15-311.
9
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.基于支持向量机的方法区分长非编码 RNA 与蛋白质编码转录本。
BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4.
10
A Feature Fusion Predictor for RNA Pseudouridine Sites with Particle Swarm Optimizer Based Feature Selection and Ensemble Learning Approach.基于粒子群优化算法特征选择和集成学习方法的 RNA 假尿嘧啶位点特征融合预测器。
Curr Issues Mol Biol. 2021 Nov 1;43(3):1844-1858. doi: 10.3390/cimb43030129.

引用本文的文献

1
A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs.大规模基准研究工具用于蛋白质编码和非编码 RNA 的分类。
Nucleic Acids Res. 2022 Nov 28;50(21):12094-12111. doi: 10.1093/nar/gkac1092.