• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于研究蛋白质-配体相互作用的自然语言处理方法

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

作者信息

Michels James, Bandarupalli Ramya, Akbari Amin Ahangar, Le Thai, Xiao Hong, Li Jing, Hom Erik F Y

机构信息

Department of Computer Science, University of Mississippi, University, MS.

Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, MS.

出版信息

ArXiv. 2024 Oct 17:arXiv:2409.13057v2.

PMID:39483353
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11527106/
Abstract

Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted, including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases of existing datasets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.

摘要

自然语言处理(NLP)彻底改变了计算机用于研究人类语言并与之交互的方式,并且在蛋白质与配体结合的研究中越来越有影响力,而这种结合对于药物发现和开发至关重要。本综述探讨了NLP技术如何被用于解码蛋白质和小分子配体的“语言”,以预测蛋白质-配体相互作用(PLIs)。我们讨论了诸如长短期记忆(LSTM)网络、变换器和注意力机制等方法如何利用不同的蛋白质和配体数据类型来识别潜在的相互作用模式。突出了重大挑战,包括高质量阴性数据的稀缺、解释模型决策的困难以及现有数据集的采样偏差。我们认为,专注于提高数据质量、增强模型稳健性以及促进合作与竞争能够推动基于机器学习的PLIs预测的未来进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/6397144a84e0/nihpp-2409.13057v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/bf4ba80e8a85/nihpp-2409.13057v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/5351a1de8b8f/nihpp-2409.13057v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/6397144a84e0/nihpp-2409.13057v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/bf4ba80e8a85/nihpp-2409.13057v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/5351a1de8b8f/nihpp-2409.13057v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5a8/11527106/6397144a84e0/nihpp-2409.13057v2-f0003.jpg

相似文献

1
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于研究蛋白质-配体相互作用的自然语言处理方法
ArXiv. 2024 Oct 17:arXiv:2409.13057v2.
2
Natural Language Processing Methods for the Study of Protein-Ligand Interactions.用于蛋白质-配体相互作用研究的自然语言处理方法
J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.
3
Screening for Depression Using Natural Language Processing: Literature Review.使用自然语言处理技术筛查抑郁症:文献综述
Interact J Med Res. 2024 Nov 4;13:e55067. doi: 10.2196/55067.
4
Leveraging transformers-based language models in proteome bioinformatics.基于转换器的语言模型在蛋白质组生物信息学中的应用。
Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.
5
Comparing deep learning architectures for sentiment analysis on drug reviews.比较药物评论情感分析的深度学习架构。
J Biomed Inform. 2020 Oct;110:103539. doi: 10.1016/j.jbi.2020.103539. Epub 2020 Aug 17.
6
NLP for Analyzing Electronic Health Records and Clinical Notes in Cancer Research: A Review.用于癌症研究中分析电子健康记录和临床笔记的自然语言处理:综述
J Pain Symptom Manage. 2025 May;69(5):e374-e394. doi: 10.1016/j.jpainsymman.2025.01.019. Epub 2025 Jan 31.
7
Leveraging Large Language Models for Improved Understanding of Communications With Patients With Cancer in a Call Center Setting: Proof-of-Concept Study.在呼叫中心环境中利用大语言模型增进对癌症患者沟通的理解:概念验证研究
J Med Internet Res. 2024 Dec 11;26:e63892. doi: 10.2196/63892.
8
Natural Language Processing Technologies for Public Health in Africa: Scoping Review.非洲公共卫生领域的自然语言处理技术:范围综述
J Med Internet Res. 2025 Mar 5;27:e68720. doi: 10.2196/68720.
9
Protein-Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools.源自经典和基于机器学习的自然语言处理工具的蛋白质-蛋白质相互作用网络
J Proteome Res. 2024 Dec 6;23(12):5395-5404. doi: 10.1021/acs.jproteome.4c00535. Epub 2024 Nov 11.
10
Natural language processing to predict isocitrate dehydrogenase genotype in diffuse glioma using MR radiology reports.基于磁共振影像学报告的自然语言处理预测弥漫性脑胶质瘤异柠檬酸脱氢酶基因型
Eur Radiol. 2023 Nov;33(11):8017-8025. doi: 10.1007/s00330-023-10061-z. Epub 2023 Aug 11.

本文引用的文献

1
Theoretical foundations and limits of word embeddings: What types of meaning can they capture?词嵌入的理论基础与局限性:它们能捕捉哪些类型的意义?
Sociol Methods Res. 2024 Nov;53(4):1753-1793. doi: 10.1177/00491241221140142. Epub 2022 Dec 7.
2
Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences.通过冷冻电镜映射和蛋白质序列之间的跨模态对齐进行蛋白质复合物结构建模。
Nat Commun. 2024 Oct 11;15(1):8808. doi: 10.1038/s41467-024-53116-5.
3
The simplicity of protein sequence-function relationships.
蛋白质序列与功能关系的简单性。
Nat Commun. 2024 Sep 11;15(1):7953. doi: 10.1038/s41467-024-51895-5.
4
Modelling protein complexes with crosslinking mass spectrometry and deep learning.用交联质谱和深度学习构建蛋白质复合物模型。
Nat Commun. 2024 Sep 9;15(1):7866. doi: 10.1038/s41467-024-51771-2.
5
So you got a null result. Will anyone publish it?所以你得到了一个无效结果。会有人发表它吗?
Nature. 2024 Jul;631(8022):728-730. doi: 10.1038/d41586-024-02383-9.
6
A data science roadmap for open science organizations engaged in early-stage drug discovery.面向早期药物发现的开放科学组织的数据科学路线图。
Nat Commun. 2024 Jul 5;15(1):5640. doi: 10.1038/s41467-024-49777-x.
7
Q-BioLiP: A Comprehensive Resource for Quaternary Structure-based Protein-ligand Interactions.Q-BioLiP:基于四级结构的蛋白质-配体相互作用的综合资源。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzae001.
8
Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。
Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.
9
Computing the relative binding affinity of ligands based on a pairwise binding comparison network.基于配体两两结合比较网络计算配体的相对结合亲和力。
Nat Comput Sci. 2023 Oct;3(10):860-872. doi: 10.1038/s43588-023-00529-9. Epub 2023 Oct 19.
10
AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism.AttentionMGT-DTA:一种基于图变换和注意力机制的多模态药物-靶标亲和力预测方法。
Neural Netw. 2024 Jan;169:623-636. doi: 10.1016/j.neunet.2023.11.018. Epub 2023 Nov 11.