• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。

Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.

机构信息

Institute for Advanced Study, Shenzhen University, Shenzhen 518061, China.

出版信息

Genes (Basel). 2024 Aug 18;15(8):1090. doi: 10.3390/genes15081090.

DOI:10.3390/genes15081090
PMID:39202449
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353971/
Abstract

Protein and nucleic acid binding site prediction is a critical computational task that benefits a wide range of biological processes. Previous studies have shown that feature selection holds particular significance for this prediction task, making the generation of more discriminative features a key area of interest for many researchers. Recent progress has shown the power of protein language models in handling protein sequences, in leveraging the strengths of attention networks, and in successful applications to tasks such as protein structure prediction. This naturally raises the question of the applicability of protein language models in predicting protein and nucleic acid binding sites. Various approaches have explored this potential. This paper first describes the development of protein language models. Then, a systematic review of the latest methods for predicting protein and nucleic acid binding sites is conducted by covering benchmark sets, feature generation methods, performance comparisons, and feature ablation studies. These comparisons demonstrate the importance of protein language models for the prediction task. Finally, the paper discusses the challenges of protein and nucleic acid binding site prediction and proposes possible research directions and future trends. The purpose of this survey is to furnish researchers with actionable suggestions for comprehending the methodologies used in predicting protein-nucleic acid binding sites, fostering the creation of protein-centric language models, and tackling real-world obstacles encountered in this field.

摘要

蛋白质和核酸结合位点预测是一项关键的计算任务,对广泛的生物过程都有裨益。先前的研究表明,特征选择在这个预测任务中具有特殊的意义,因此生成更具区分度的特征是许多研究人员关注的重点。最近的进展表明,蛋白质语言模型在处理蛋白质序列、利用注意力网络的优势以及在蛋白质结构预测等任务中的成功应用方面具有强大的能力。这自然引发了一个问题,即蛋白质语言模型是否适用于预测蛋白质和核酸结合位点。各种方法已经探索了这种可能性。本文首先描述了蛋白质语言模型的发展。然后,通过涵盖基准集、特征生成方法、性能比较和特征消融研究,对预测蛋白质和核酸结合位点的最新方法进行了系统的回顾。这些比较表明了蛋白质语言模型对于预测任务的重要性。最后,本文讨论了蛋白质和核酸结合位点预测的挑战,并提出了可能的研究方向和未来趋势。本调查的目的是为研究人员提供可行的建议,以帮助他们理解预测蛋白质-核酸结合位点所使用的方法,促进基于蛋白质的语言模型的创建,并解决该领域中遇到的实际障碍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/58e31ffb7f88/genes-15-01090-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/36cd342d99a5/genes-15-01090-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/93af942c068f/genes-15-01090-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/f96963f0e2ef/genes-15-01090-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/856aa29c9e8b/genes-15-01090-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/58e31ffb7f88/genes-15-01090-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/36cd342d99a5/genes-15-01090-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/93af942c068f/genes-15-01090-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/f96963f0e2ef/genes-15-01090-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/856aa29c9e8b/genes-15-01090-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8155/11353971/58e31ffb7f88/genes-15-01090-g005.jpg

相似文献

1
Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。
Genes (Basel). 2024 Aug 18;15(8):1090. doi: 10.3390/genes15081090.
2
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.SOFB 是一种全面的集成深度学习方法,用于阐明和描述蛋白质-核酸结合残基。
Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0.
3
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.
4
An Overview of Computational Tools of Nucleic Acid Binding Site Prediction for Site-specific Proteins and Nucleases.位点特异性蛋白质和核酸酶的核酸结合位点预测计算工具概述
Protein Pept Lett. 2020;27(5):370-384. doi: 10.2174/0929866526666191028162302.
5
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.EquiPNAS:利用基于蛋白质语言模型的等变深度图神经网络提高蛋白质-核酸结合位点预测。
Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039.
6
NesT-NABind: a Nested Transformer for Nucleic Acid-Binding Site Prediction on Protein Surface.NesT-NABind:一种用于预测蛋白质表面核酸结合位点的嵌套变换器。
J Chem Inf Model. 2025 Feb 10;65(3):1166-1177. doi: 10.1021/acs.jcim.4c01765. Epub 2025 Jan 17.
7
Bioinformatics Approaches for Understanding the Binding Affinity of Protein-Nucleic Acid Complexes.生物信息学方法研究蛋白质-核酸复合物的结合亲和力。
Methods Mol Biol. 2025;2867:315-330. doi: 10.1007/978-1-0716-4196-5_18.
8
Prediction of interactiveness of proteins and nucleic acids based on feature selections.基于特征选择的蛋白质和核酸相互作用预测。
Mol Divers. 2010 Nov;14(4):627-33. doi: 10.1007/s11030-009-9198-9. Epub 2009 Oct 9.
9
CryptoBench: cryptic protein-ligand binding sites dataset and benchmark.CryptoBench:神秘蛋白质-配体结合位点数据集及基准测试
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae745.
10
Importance of molecular computer modeling in anticancer drug development.分子计算机建模在抗癌药物研发中的重要性。
J BUON. 2007 Sep;12 Suppl 1:S101-18.

引用本文的文献

1
Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction.基于语言模型的蛋白质-核酸结合位点预测研究进展
Methods Mol Biol. 2025;2941:139-151. doi: 10.1007/978-1-0716-4623-6_9.
2
Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.在内在无序蛋白质构象集合采样中,人工智能方法相较于分子动力学模拟的应用。
Front Mol Biosci. 2025 Apr 8;12:1542267. doi: 10.3389/fmolb.2025.1542267. eCollection 2025.

本文引用的文献

1
EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.EGPDI:基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.
2
GPSFun: geometry-aware protein sequence function predictions with language models.GPSFun:基于语言模型的几何感知蛋白质序列功能预测。
Nucleic Acids Res. 2024 Jul 5;52(W1):W248-W255. doi: 10.1093/nar/gkae381.
3
Genome-scale annotation of protein binding sites via language model and geometric deep learning.通过语言模型和几何深度学习进行蛋白质结合位点的全基因组注释。
Elife. 2024 Apr 17;13:RP93695. doi: 10.7554/eLife.93695.
4
Machine learning approaches in predicting allosteric sites.基于机器学习的别构位点预测方法。
Curr Opin Struct Biol. 2024 Apr;85:102774. doi: 10.1016/j.sbi.2024.102774. Epub 2024 Feb 13.
5
ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction.ULDNA:将无监督多源语言模型与 LSTM-注意力网络集成,以实现高精度的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae040.
6
Predicting DNA structure using a deep learning method.使用深度学习方法预测 DNA 结构。
Nat Commun. 2024 Feb 9;15(1):1243. doi: 10.1038/s41467-024-45191-5.
7
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.EquiPNAS:利用基于蛋白质语言模型的等变深度图神经网络提高蛋白质-核酸结合位点预测。
Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039.
8
Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning.基于预训练蛋白质语言模型和对比学习的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad488.
9
HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins.HybridDBRpred:利用结构复合物和无序蛋白的注释改进基于序列的 DNA 结合氨基酸预测。
Nucleic Acids Res. 2024 Jan 25;52(2):e10. doi: 10.1093/nar/gkad1131.
10
DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.DeepProSite:使用 ESMFold 和预训练语言模型进行结构感知的蛋白质结合位点预测。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad718.