• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iProtDNA-SMOTE:通过不平衡图神经网络增强蛋白质-DNA结合位点预测

iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.

作者信息

Huang Ruiyan, Qiu Wangren, Xiao Xuan, Lin Weizhong

机构信息

School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen Jiangxi, China.

School of Information Engineering, Jingxi Art & Ceramics Technology Institute, Jingdezhen Jiangxi, China.

出版信息

PLoS One. 2025 May 13;20(5):e0320817. doi: 10.1371/journal.pone.0320817. eCollection 2025.

DOI:10.1371/journal.pone.0320817
PMID:40359455
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12074593/
Abstract

Protein-DNA interactions play a crucial role in cellular biology, essential for maintaining life processes and regulating cellular functions. We propose a method called iProtDNA-SMOTE, which utilizes non-equilibrium graph neural networks along with pre-trained protein language models to predict DNA binding residues. This approach effectively addresses the class imbalance issue in predicting protein-DNA binding sites by leveraging unbalanced graph data, thus enhancing model's generalization and specificity. We trained the model on two datasets, TR646 and TR573, and conducted a series of experiments to evaluate its performance. The model achieved AUC values of 0.850, 0.896, and 0.858 on the independent test datasets TE46, TE129, and TE181, respectively. These results indicate that iProtDNA-SMOTE outperforms existing methods in terms of accuracy and generalization for predicting DNA binding sites, offering reliable and effective predictions to minimize errors. The model has been thoroughly validated for its ability to predict protein-DNA binding sites with high reliability and precision. For the convenience of the scientific community, the benchmark datasets and codes are publicly available at https://github.com/primrosehry/iProtDNA-SMOTE.

摘要

蛋白质与DNA的相互作用在细胞生物学中起着至关重要的作用,对维持生命过程和调节细胞功能必不可少。我们提出了一种名为iProtDNA-SMOTE的方法,该方法利用非平衡图神经网络以及预训练的蛋白质语言模型来预测DNA结合残基。这种方法通过利用不平衡的图数据有效地解决了预测蛋白质-DNA结合位点时的类别不平衡问题,从而提高了模型的泛化能力和特异性。我们在TR646和TR573两个数据集上训练了该模型,并进行了一系列实验来评估其性能。该模型在独立测试数据集TE46、TE129和TE181上分别取得了0.850、0.896和0.858的AUC值。这些结果表明,在预测DNA结合位点的准确性和泛化能力方面,iProtDNA-SMOTE优于现有方法,能够提供可靠有效的预测以尽量减少误差。该模型预测蛋白质-DNA结合位点的能力已得到充分验证,具有高可靠性和精确性。为方便科学界使用,基准数据集和代码可在https://github.com/primrosehry/iProtDNA-SMOTE上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f17/12074593/5f9251b7a914/pone.0320817.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f17/12074593/9374ca53fef3/pone.0320817.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f17/12074593/5f9251b7a914/pone.0320817.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f17/12074593/9374ca53fef3/pone.0320817.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f17/12074593/5f9251b7a914/pone.0320817.g002.jpg

相似文献

1
iProtDNA-SMOTE: Enhancing protein-DNA binding sites prediction through imbalanced graph neural networks.iProtDNA-SMOTE:通过不平衡图神经网络增强蛋白质-DNA结合位点预测
PLoS One. 2025 May 13;20(5):e0320817. doi: 10.1371/journal.pone.0320817. eCollection 2025.
2
MVGNN-PPIS: A novel multi-view graph neural network for protein-protein interaction sites prediction based on Alphafold3-predicted structures and transfer learning.MVGNN-PPIS:一种基于Alphafold3预测结构和迁移学习的用于蛋白质-蛋白质相互作用位点预测的新型多视图图神经网络。
Int J Biol Macromol. 2025 Apr;300:140096. doi: 10.1016/j.ijbiomac.2025.140096. Epub 2025 Jan 21.
3
EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.EGPDI:基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.
4
Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling.解析蛋白质- DNA 相互作用的语言:结合上下文嵌入和多尺度序列建模的深度学习方法。
J Mol Biol. 2024 Nov 15;436(22):168769. doi: 10.1016/j.jmb.2024.168769. Epub 2024 Aug 29.
5
Prediction of human O-linked glycosylation sites using stacked generalization and embeddings from pre-trained protein language model.基于堆叠泛化和预训练蛋白质语言模型嵌入的人源 O 糖基化位点预测。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae643.
6
GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5.GraphPBSP:基于图注意力网络和预训练模型ProstT5的蛋白质结合位点预测
Int J Biol Macromol. 2024 Dec;282(Pt 1):136933. doi: 10.1016/j.ijbiomac.2024.136933. Epub 2024 Oct 28.
7
EMcnv: enhancing CNV detection performance through ensemble strategies with heterogeneous meta-graph neural networks.EMcnv:通过使用异构元图神经网络的集成策略提高拷贝数变异(CNV)检测性能。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf135.
8
CacPred: a cascaded convolutional neural network for TF-DNA binding prediction.CacPred:用于转录因子-脱氧核糖核酸结合预测的级联卷积神经网络
BMC Genomics. 2025 Mar 18;26(Suppl 2):264. doi: 10.1186/s12864-025-11399-y.
9
GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks.GraphPhos:基于图神经网络预测蛋白质磷酸化位点
Int J Mol Sci. 2025 Jan 23;26(3):941. doi: 10.3390/ijms26030941.
10
PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models.PDNAPred:基于预先训练的蛋白质语言模型的蛋白质-DNA 结合位点的可解释预测。
Int J Biol Macromol. 2024 Nov;281(Pt 2):136147. doi: 10.1016/j.ijbiomac.2024.136147. Epub 2024 Oct 1.

本文引用的文献

1
Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in -regulatory elements.基于结构的学习,用于预测和建模调控元件中的蛋白质-DNA相互作用及转录因子协同作用。
NAR Genom Bioinform. 2024 Jun 12;6(2):lqae068. doi: 10.1093/nargab/lqae068. eCollection 2024 Jun.
2
Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。
Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.
3
ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction.
ULDNA:将无监督多源语言模型与 LSTM-注意力网络集成,以实现高精度的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae040.
4
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.EquiPNAS:利用基于蛋白质语言模型的等变深度图神经网络提高蛋白质-核酸结合位点预测。
Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039.
5
Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning.基于预训练蛋白质语言模型和对比学习的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad488.
6
Nucleic acid-based small molecules as targeted transcription therapeutics for immunoregulation.基于核酸的小分子作为靶向转录治疗免疫调节的药物。
Allergy. 2024 Apr;79(4):843-860. doi: 10.1111/all.15959. Epub 2023 Dec 6.
7
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
8
A deep learning-based method for the prediction of DNA interacting residues in a protein.基于深度学习的蛋白质 DNA 相互作用残基预测方法。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac322.
9
Protein-DNA Binding Residues Prediction Using a Deep Learning Model With Hierarchical Feature Extraction.使用具有分层特征提取的深度学习模型预测蛋白质与DNA结合残基
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2619-2628. doi: 10.1109/TCBB.2022.3190933. Epub 2023 Oct 9.
10
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.基于比对和基于预训练特征表示的 DNA 结合蛋白鉴定的比较分析。
Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022.