• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DRBP-EDP:使用ESM-2和双路径神经网络对DNA结合蛋白和RNA结合蛋白进行分类

DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.

作者信息

Mu Qiang, Yu Guoping, Zhou Guomin, He Yubing, Zhang Jianhua

机构信息

Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.

National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences/Hainan Seed Industry Laboratory, Sanya 572024, China.

出版信息

NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.

DOI:10.1093/nargab/lqaf058
PMID:40391089
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086546/
Abstract

Regulation of DNA or RNA at the transcriptional, post-transcriptional, and translational levels are key steps in the central dogma of molecular biology. DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) play pivotal roles in the precise regulation of gene expression in these steps. Both of these two classes of proteins are nucleic acid-binding proteins (NABPs), so they exhibit significant similarity in both sequence and structure. However, traditional methods for identifying NABPs are typically time-consuming, costly, and challenging to scale up. Utilizing deep learning to classify proteins intelligently has emerged as a more efficient solution for these issues. In this study, we propose a phased classification method integrating ESM-2 with a dual-path neural network, called DRBP-EDP. Additionally, a refined approach to dataset construction is designed, resulting in the creation of high-quality protein classification datasets. The results demonstrated that the model achieved strong performance, with 90.03% accuracy in the first stage for classifying NABPs and non-nucleic acid-binding proteins, and 89.56% accuracy in the second stage for classifying DBPs and RBPs. To enhance accessibility and usability, DRBP-EDP has been developed in both executable and web-based versions, which are publicly available at https://doi.org/10.5281/zenodo.14092184 and https://github.com/MuQiang-MQ/DRBP-EDP.

摘要

在转录、转录后和翻译水平对DNA或RNA进行调控是分子生物学中心法则的关键步骤。DNA结合蛋白(DBP)和RNA结合蛋白(RBP)在这些步骤中基因表达的精确调控中发挥着关键作用。这两类蛋白都是核酸结合蛋白(NABP),因此它们在序列和结构上都表现出显著的相似性。然而,传统的识别NABP的方法通常耗时、成本高且难以扩大规模。利用深度学习对蛋白质进行智能分类已成为解决这些问题的一种更有效的方法。在本研究中,我们提出了一种将ESM-2与双路径神经网络相结合的分阶段分类方法,称为DRBP-EDP。此外,还设计了一种改进的数据集构建方法,从而创建了高质量的蛋白质分类数据集。结果表明,该模型表现出色,在第一阶段对NABP和非核酸结合蛋白进行分类时准确率为90.03%,在第二阶段对DBP和RBP进行分类时准确率为89.56%。为了提高可访问性和可用性,DRBP-EDP已开发出可执行版本和基于网络的版本,可在https://doi.org/10.5281/zenodo.14092184和https://github.com/MuQiang-MQ/DRBP-EDP上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/7328e0e1063f/lqaf058fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/592ee984fdd4/lqaf058figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/553d50f56db9/lqaf058fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/19400e800f39/lqaf058fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/2bc19bfcda3d/lqaf058fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/70bb23f2ec94/lqaf058fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/cfff99bcf7d4/lqaf058fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/7328e0e1063f/lqaf058fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/592ee984fdd4/lqaf058figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/553d50f56db9/lqaf058fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/19400e800f39/lqaf058fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/2bc19bfcda3d/lqaf058fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/70bb23f2ec94/lqaf058fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/cfff99bcf7d4/lqaf058fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac4b/12086546/7328e0e1063f/lqaf058fig6.jpg

相似文献

1
DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.DRBP-EDP:使用ESM-2和双路径神经网络对DNA结合蛋白和RNA结合蛋白进行分类
NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.
2
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
3
iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network.iDRBP_MMC:基于多标签学习模型和基于模体的卷积神经网络的 DNA 结合蛋白和 RNA 结合蛋白的鉴定。
J Mol Biol. 2020 Nov 6;432(22):5860-5875. doi: 10.1016/j.jmb.2020.09.008. Epub 2020 Sep 11.
4
iDRBP-ECHF: Identifying DNA- and RNA-binding proteins based on extensible cubic hybrid framework.iDRBP-ECHF:基于可扩展立方混合框架的 DNA 和 RNA 结合蛋白识别。
Comput Biol Med. 2022 Oct;149:105940. doi: 10.1016/j.compbiomed.2022.105940. Epub 2022 Aug 13.
5
DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory.DeepDRBP-2L:一种新的基因组注释预测器,用于使用卷积神经网络和长短期记忆识别 DNA 结合蛋白和 RNA 结合蛋白。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1451-1463. doi: 10.1109/TCBB.2019.2952338. Epub 2021 Aug 6.
6
IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation.基于位置特异得分矩阵和位置特异频率矩阵交叉变换的核酸结合蛋白识别方法(IDRBP-PPCT)
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2284-2293. doi: 10.1109/TCBB.2021.3069263. Epub 2022 Aug 8.
7
DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins.深度MC-iNABP:用于核酸结合蛋白多类识别和分类的深度学习
Comput Struct Biotechnol J. 2022 Apr 26;20:2020-2028. doi: 10.1016/j.csbj.2022.04.029. eCollection 2022.
8
DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning.DeepDISOBind:通过深度多任务学习准确预测 RNA、DNA 和蛋白质结合的无规卷曲残基。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab521.
9
Prediction of the RBP binding sites on lncRNAs using the high-order nucleotide encoding convolutional neural network.使用高阶核苷酸编码卷积神经网络预测长链非编码RNA上的RBP结合位点
Anal Biochem. 2019 Oct 15;583:113364. doi: 10.1016/j.ab.2019.113364. Epub 2019 Jul 16.
10
Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure.利用 RNA 一级序列和二级结构的分布式表示来推断 RNA 结合蛋白结合位点的深度神经网络。
BMC Genomics. 2020 Dec 17;21(Suppl 13):866. doi: 10.1186/s12864-020-07239-w.

本文引用的文献

1
MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning.MERIT:利用混合深度变压器网络、进化耦合和迁移学习准确预测多配体结合残基
J Mol Biol. 2025 Aug 1;437(15):168872. doi: 10.1016/j.jmb.2024.168872. Epub 2024 Nov 20.
2
Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。
Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.
3
Simulating 500 million years of evolution with a language model.
用语言模型模拟5亿年的进化历程。
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
4
Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein.通过在 DNA 结合蛋白上进行领域自适应预训练来提高通用蛋白质语言模型的预测性能。
Nat Commun. 2024 Sep 7;15(1):7838. doi: 10.1038/s41467-024-52293-7.
5
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
6
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.ProkDBP:致力于更精确地识别原核 DNA 结合蛋白。
Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.
7
RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes.RBProkCNN:基于适当上下文进化信息的深度学习用于原核生物中RNA结合蛋白的发现
Comput Struct Biotechnol J. 2024 Apr 15;23:1631-1640. doi: 10.1016/j.csbj.2024.04.034. eCollection 2024 Dec.
8
DBPMod: a supervised learning model for computational recognition of DNA-binding proteins in model organisms.DBPMod:一种用于在模式生物中计算识别 DNA 结合蛋白的监督学习模型。
Brief Funct Genomics. 2024 Jul 19;23(4):363-372. doi: 10.1093/bfgp/elad039.
9
PredDRBP-MLP: Prediction of DNA-binding proteins and RNA-binding proteins by multilayer perceptron.PredDRBP-MLP:通过多层感知器预测DNA结合蛋白和RNA结合蛋白
Comput Biol Med. 2023 Sep;164:107317. doi: 10.1016/j.compbiomed.2023.107317. Epub 2023 Aug 7.
10
Analysis and Prediction of Pathogen Nucleic Acid Specificity for Toll-like Receptors in Vertebrates.脊椎动物 Toll 样受体对病原体核酸的特异性分析与预测。
J Mol Biol. 2023 Sep 1;435(17):168208. doi: 10.1016/j.jmb.2023.168208. Epub 2023 Jul 20.