蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。

Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.

机构信息

Institute for Advanced Study, Shenzhen University, Shenzhen 518061, China.

出版信息

Genes (Basel). 2024 Aug 18;15(8):1090. doi: 10.3390/genes15081090.

DOI:10.3390/genes15081090

PMID:39202449

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353971/

Abstract

Protein and nucleic acid binding site prediction is a critical computational task that benefits a wide range of biological processes. Previous studies have shown that feature selection holds particular significance for this prediction task, making the generation of more discriminative features a key area of interest for many researchers. Recent progress has shown the power of protein language models in handling protein sequences, in leveraging the strengths of attention networks, and in successful applications to tasks such as protein structure prediction. This naturally raises the question of the applicability of protein language models in predicting protein and nucleic acid binding sites. Various approaches have explored this potential. This paper first describes the development of protein language models. Then, a systematic review of the latest methods for predicting protein and nucleic acid binding sites is conducted by covering benchmark sets, feature generation methods, performance comparisons, and feature ablation studies. These comparisons demonstrate the importance of protein language models for the prediction task. Finally, the paper discusses the challenges of protein and nucleic acid binding site prediction and proposes possible research directions and future trends. The purpose of this survey is to furnish researchers with actionable suggestions for comprehending the methodologies used in predicting protein-nucleic acid binding sites, fostering the creation of protein-centric language models, and tackling real-world obstacles encountered in this field.

摘要

蛋白质和核酸结合位点预测是一项关键的计算任务，对广泛的生物过程都有裨益。先前的研究表明，特征选择在这个预测任务中具有特殊的意义，因此生成更具区分度的特征是许多研究人员关注的重点。最近的进展表明，蛋白质语言模型在处理蛋白质序列、利用注意力网络的优势以及在蛋白质结构预测等任务中的成功应用方面具有强大的能力。这自然引发了一个问题，即蛋白质语言模型是否适用于预测蛋白质和核酸结合位点。各种方法已经探索了这种可能性。本文首先描述了蛋白质语言模型的发展。然后，通过涵盖基准集、特征生成方法、性能比较和特征消融研究，对预测蛋白质和核酸结合位点的最新方法进行了系统的回顾。这些比较表明了蛋白质语言模型对于预测任务的重要性。最后，本文讨论了蛋白质和核酸结合位点预测的挑战，并提出了可能的研究方向和未来趋势。本调查的目的是为研究人员提供可行的建议，以帮助他们理解预测蛋白质-核酸结合位点所使用的方法，促进基于蛋白质的语言模型的创建，并解决该领域中遇到的实际障碍。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。

Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。

Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献