Suppr超能文献

用于研究蛋白质-配体相互作用的自然语言处理方法

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

作者信息

Michels James, Bandarupalli Ramya, Akbari Amin Ahangar, Le Thai, Xiao Hong, Li Jing, Hom Erik F Y

机构信息

Department of Computer Science, University of Mississippi, University, MS.

Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS.

出版信息

ArXiv. 2024 Oct 17:arXiv:2409.13057v2.

Abstract

Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted, including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases of existing datasets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.

摘要

自然语言处理(NLP)彻底改变了计算机用于研究人类语言并与之交互的方式,并且在蛋白质与配体结合的研究中越来越有影响力,而这种结合对于药物发现和开发至关重要。本综述探讨了NLP技术如何被用于解码蛋白质和小分子配体的“语言”,以预测蛋白质-配体相互作用(PLIs)。我们讨论了诸如长短期记忆(LSTM)网络、变换器和注意力机制等方法如何利用不同的蛋白质和配体数据类型来识别潜在的相互作用模式。突出了重大挑战,包括高质量阴性数据的稀缺、解释模型决策的困难以及现有数据集的采样偏差。我们认为,专注于提高数据质量、增强模型稳健性以及促进合作与竞争能够推动基于机器学习的PLIs预测的未来进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/bf4ba80e8a85/nihpp-2409.13057v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验