• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用注意力图引导的图卷积网络结合蛋白质语言嵌入和物理化学信息预测核酸结合位点。

Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.

作者信息

Li Xiang, Peng Wei, Zhu Xiaolei

机构信息

School of Information and Artificial Intelligence, Anhui Agricultural University, 130 Changjiang Road, Shushan District, Hefei, Anhui 230036, China.

出版信息

Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.

DOI:10.1093/bib/bbaf457
PMID:40919912
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12415854/
Abstract

Protein-nucleic acid binding sites play a crucial role in biological processes such as gene expression, signal transduction, replication, and transcription. In recent years, with the development of artificial intelligence, protein language models, graph neural networks, and transformer architectures have been adopted to develop both structure-based and sequence-based predictive models. Structure-based methods benefit from the spatial relationship between residues and have shown promising performance. However, structure-based information requires 3D protein structures, which is a challenge for large-scale protein sequence spaces. To address this limitation, researchers have attempted to use predicted protein structure information to guide binding site prediction. While this strategy has improved accuracy, it still depends on the quality of structure predictions. Thus, some studies have returned to prediction methods based solely on protein sequences, particularly those using protein language models, which have greatly enhanced the prediction accuracy. This paper proposes a novel protein-nucleic acid binding site prediction framework, ATtention Maps and Graph convolutional neural networks to predict nucleic acid-protein Binding sites (ATMGBs), which first fuses protein language embeddings with physicochemical properties to obtain multiview information, then leverages the attention map of a protein language model to simulate the relationship between residues, and then utilizes graph convolutional networks for enhancing the feature representations for final prediction. ATMGBs was evaluated on several different independent test sets. The results indicate that the proposed approach significantly improves sequence-based prediction performance, even achieving prediction accuracy comparable to structure-based frameworks. The dataset and code used in this study are available at https://github.com/lixiangli01/ATMGBs.

摘要

蛋白质 - 核酸结合位点在基因表达、信号转导、复制和转录等生物过程中起着至关重要的作用。近年来,随着人工智能的发展,蛋白质语言模型、图神经网络和变换器架构已被用于开发基于结构和基于序列的预测模型。基于结构的方法受益于残基之间的空间关系,并已显示出有前景的性能。然而,基于结构的信息需要三维蛋白质结构,这对于大规模蛋白质序列空间来说是一个挑战。为了解决这一限制,研究人员尝试使用预测的蛋白质结构信息来指导结合位点预测。虽然这种策略提高了准确性,但它仍然依赖于结构预测的质量。因此,一些研究又回到了仅基于蛋白质序列的预测方法,特别是那些使用蛋白质语言模型的方法,这些方法大大提高了预测准确性。本文提出了一种新颖的蛋白质 - 核酸结合位点预测框架,即注意力图谱和图卷积神经网络预测核酸 - 蛋白质结合位点(ATMGBs),该框架首先将蛋白质语言嵌入与物理化学性质融合以获得多视图信息,然后利用蛋白质语言模型的注意力图谱来模拟残基之间的关系,接着利用图卷积网络增强特征表示以进行最终预测。ATMGBs在几个不同的独立测试集上进行了评估。结果表明,所提出的方法显著提高了基于序列的预测性能,甚至达到了与基于结构的框架相当的预测准确性。本研究中使用的数据集和代码可在https://github.com/lixiangli01/ATMGBs上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/fdd44596e994/bbaf457f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/376e4aa7c48d/bbaf457f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/39ec5ffa110b/bbaf457f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/893942751a67/bbaf457f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/5bb6001382cc/bbaf457f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/fdd44596e994/bbaf457f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/376e4aa7c48d/bbaf457f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/39ec5ffa110b/bbaf457f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/893942751a67/bbaf457f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/5bb6001382cc/bbaf457f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3657/12415854/fdd44596e994/bbaf457f5.jpg

相似文献

1
Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.利用注意力图引导的图卷积网络结合蛋白质语言嵌入和物理化学信息预测核酸结合位点。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Hybrid protein-ligand binding residue prediction with protein language models: does the structure matter?利用蛋白质语言模型进行混合蛋白质-配体结合残基预测:结构重要吗?
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf431.
4
Short-Term Memory Impairment短期记忆障碍
5
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.将蛋白质序列和结构与转换器和等变图神经网络相结合,以预测蛋白质功能。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i318-i325. doi: 10.1093/bioinformatics/btad208.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
8
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
9
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
10
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.

本文引用的文献

1
Identifying Protein-Nucleotide Binding Residues via Grouped Multi-task Learning and Pre-trained Protein Language Models.通过分组多任务学习和预训练蛋白质语言模型识别蛋白质-核苷酸结合残基
J Chem Inf Model. 2025 Jan 27;65(2):1040-1052. doi: 10.1021/acs.jcim.4c02092. Epub 2025 Jan 9.
2
PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models.PDNAPred:基于预先训练的蛋白质语言模型的蛋白质-DNA 结合位点的可解释预测。
Int J Biol Macromol. 2024 Nov;281(Pt 2):136147. doi: 10.1016/j.ijbiomac.2024.136147. Epub 2024 Oct 1.
3
EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.
EGPDI:基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.
4
Genome-scale annotation of protein binding sites via language model and geometric deep learning.通过语言模型和几何深度学习进行蛋白质结合位点的全基因组注释。
Elife. 2024 Apr 17;13:RP93695. doi: 10.7554/eLife.93695.
5
Sm-like protein Rof inhibits transcription termination factor ρ by binding site obstruction and conformational insulation.Sm 样蛋白 Rof 通过结合位点阻塞和构象隔离抑制转录终止因子 ρ。
Nat Commun. 2024 Apr 15;15(1):3186. doi: 10.1038/s41467-024-47439-6.
6
ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction.ULDNA:将无监督多源语言模型与 LSTM-注意力网络集成,以实现高精度的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae040.
7
MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins.MucLiPred:用于预测蛋白质核酸结合残基的多级对比学习
J Chem Inf Model. 2024 Feb 12;64(3):1050-1065. doi: 10.1021/acs.jcim.3c01471. Epub 2024 Feb 1.
8
Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning.基于预训练蛋白质语言模型和对比学习的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad488.
9
HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins.HybridDBRpred:利用结构复合物和无序蛋白的注释改进基于序列的 DNA 结合氨基酸预测。
Nucleic Acids Res. 2024 Jan 25;52(2):e10. doi: 10.1093/nar/gkad1131.
10
DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.DeepProSite:使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.