• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型对蛋白质结合口袋进行文献驱动的优先级排序。

Leveraging large language models for literature-driven prioritization of protein binding pockets.

作者信息

Stratiichuk Roman, Melnychenko Mykola, Koleiev Ihor, Voitsitskyi Taras, Husak Vladyslav, Shevchuk Nazar, Ostrovsky Zakhar, Bdzhola Volodymyr, Yesylevskyy Semen, Starosyla Serhii, Nafiiev Alan

机构信息

Receptor.AI Inc., London N1 7GU, United Kingdom.

Department of Biophysics and Medical Informatics, Educational and Scientific Centre "Іnstitute of Biology and Medicine", Taras Shevchenko Kyiv National University, Kyiv 01601, Ukraine.

出版信息

Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf449.

DOI:10.1093/bioinformatics/btaf449
PMID:40795239
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12371332/
Abstract

MOTIVATION

Accurately identifying and prioritizing protein binding pockets is a foundational element of small-molecule drug discovery. Defining these known pockets currently relies on a laborious manual process of extracting key residue data from selected publications, reconciling inconsistent terminology, and independently computing volumetric representations. This manual curation to ensure biological relevance is time-consuming, error-prone, and represents a major bottleneck for efficient, high-throughput drug discovery.

RESULTS

We present a novel approach for the identification and prioritization of protein binding pockets for small molecules by combining geometric pocket detection with large language models (LLMs). Our method leverages Fpocket to generate candidate pockets, which are then validated against published experimental data extracted from research articles using LLM with a series of prompts fine-tuned to identify and extract residue-level information associated with experimentally confirmed binding sites. We developed a curated benchmark dataset of diverse proteins and associated literature to train and evaluate the LLM's performance in paper relevance assessment and pocket extraction.

AVAILABILITY AND IMPLEMENTATION

The developed benchmark dataset and methodology are freely available at the GitHub repository (https://github.com/receptor-ai/LLM-benchmark-dataset) and Zenodo (DOI: 10.5281/zenodo.15798647).

摘要

动机

准确识别蛋白质结合口袋并对其进行优先级排序是小分子药物发现的基础要素。目前,定义这些已知口袋依赖于一个繁琐的手动过程,即从选定的出版物中提取关键残基数据、协调不一致的术语,并独立计算体积表示。这种为确保生物学相关性而进行的人工整理既耗时又容易出错,并且是高效、高通量药物发现的主要瓶颈。

结果

我们提出了一种通过结合几何口袋检测和大语言模型(LLMs)来识别小分子蛋白质结合口袋并对其进行优先级排序的新方法。我们的方法利用Fpocket生成候选口袋,然后使用大语言模型根据从研究文章中提取的已发表实验数据对其进行验证,该大语言模型带有一系列经过微调的提示,以识别和提取与实验确认的结合位点相关的残基水平信息。我们开发了一个经过整理的包含多种蛋白质和相关文献的基准数据集,以训练和评估大语言模型在论文相关性评估和口袋提取方面的性能。

可用性和实现方式

开发的基准数据集和方法可在GitHub存储库(https://github.com/receptor-ai/LLM-benchmark-dataset)和Zenodo(DOI:10.5281/zenodo.15798647)上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/6d486f5ea0da/btaf449f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/0f67c01ef622/btaf449f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/5b136238262a/btaf449f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/4ff26d6fa1b2/btaf449f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/b89b172dc8bb/btaf449f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/1241d3dc40c9/btaf449f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/0c2563dc9e1d/btaf449f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/9244394e732e/btaf449f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/72768fdc1e9c/btaf449f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/8e2ead96a671/btaf449f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/e8a5f68bf6ca/btaf449f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/6d486f5ea0da/btaf449f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/0f67c01ef622/btaf449f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/5b136238262a/btaf449f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/4ff26d6fa1b2/btaf449f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/b89b172dc8bb/btaf449f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/1241d3dc40c9/btaf449f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/0c2563dc9e1d/btaf449f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/9244394e732e/btaf449f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/72768fdc1e9c/btaf449f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/8e2ead96a671/btaf449f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/e8a5f68bf6ca/btaf449f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4187/12371332/6d486f5ea0da/btaf449f10.jpg

相似文献

1
Leveraging large language models for literature-driven prioritization of protein binding pockets.利用大语言模型对蛋白质结合口袋进行文献驱动的优先级排序。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf449.
2
DeepAllo: allosteric site prediction using protein language model (pLM) with multitask learning.DeepAllo:使用具有多任务学习的蛋白质语言模型(pLM)进行变构位点预测。
Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf294.
3
Hybrid protein-ligand binding residue prediction with protein language models: does the structure matter?利用蛋白质语言模型进行混合蛋白质-配体结合残基预测:结构重要吗?
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf431.
4
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
5
Top-DTI: integrating topological deep learning and large language models for drug-target interaction prediction.Top-DTI:整合拓扑深度学习和大语言模型用于药物-靶点相互作用预测
Bioinformatics. 2025 Jul 1;41(Supplement_1):i133-i141. doi: 10.1093/bioinformatics/btaf183.
6
Biomedical knowledge graph-optimized prompt generation for large language models.生物医学知识图谱优化的大语言模型提示生成。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae560.
7
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
8
Leveraging Retrieval-Augmented Large Language Models for Dietary Recommendations With Traditional Chinese Medicine's Medicine Food Homology: Algorithm Development and Validation.利用检索增强大语言模型结合中医药食同源进行饮食推荐:算法开发与验证
JMIR Med Inform. 2025 Aug 21;13:e75279. doi: 10.2196/75279.
9
Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.使用大语言模型在成瘾护理临床记录中检测污名化语言。
medRxiv. 2025 Aug 12:2025.08.08.25333315. doi: 10.1101/2025.08.08.25333315.
10
Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method.快速对用于诊断合并症患者的大语言模型进行基准测试:利用“大语言模型即评判者”方法的比较研究
JMIRx Med. 2025 Aug 29;6:e67661. doi: 10.2196/67661.

本文引用的文献

1
MolAR: Memory-Safe Library for Analysis of MD Simulations Written in Rust.MolAR:用于分析用Rust编写的分子动力学模拟的内存安全库。
J Comput Chem. 2025 Jan 5;46(1):e27536. doi: 10.1002/jcc.27536.
2
A Point Cloud Graph Neural Network for Protein-Ligand Binding Site Prediction.基于点云图神经网络的蛋白质-配体结合位点预测
Int J Mol Sci. 2024 Aug 27;25(17):9280. doi: 10.3390/ijms25179280.
3
In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot.基于 FRASE-bot 的基于片段的计算机虚拟筛选发现 CIB1 定向抗肿瘤剂。
Nat Commun. 2024 Jul 2;15(1):5564. doi: 10.1038/s41467-024-49892-9.
4
Q-BioLiP: A Comprehensive Resource for Quaternary Structure-based Protein-ligand Interactions.Q-BioLiP:基于四级结构的蛋白质-配体相互作用的综合资源。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzae001.
5
PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction.PUResNetV2.0:一种利用稀疏表示改进配体结合位点预测的深度学习模型。
J Cheminform. 2024 Jun 7;16(1):66. doi: 10.1186/s13321-024-00865-6.
6
BioLiP2: an updated structure database for biologically relevant ligand-protein interactions.BioLiP2:一个更新的生物相关配体-蛋白质相互作用结构数据库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D404-D412. doi: 10.1093/nar/gkad630.
7
Binding Site Detection Remastered: Enabling Fast, Robust, and Reliable Binding Site Detection and Descriptor Calculation with DoGSite3.结合位点检测重构:利用 DoGSite3 实现快速、稳健、可靠的结合位点检测和描述符计算。
J Chem Inf Model. 2023 May 22;63(10):3128-3137. doi: 10.1021/acs.jcim.3c00336. Epub 2023 May 2.
8
CB-Dock2: improved protein-ligand blind docking by integrating cavity detection, docking and homologous template fitting.CB-Dock2:通过整合腔检测、对接和同源模板拟合来改进蛋白质配体盲目对接。
Nucleic Acids Res. 2022 Jul 5;50(W1):W159-W164. doi: 10.1093/nar/gkac394.
9
PUResNet: prediction of protein-ligand binding sites using deep residual neural network.PUResNet:使用深度残差神经网络预测蛋白质-配体结合位点。
J Cheminform. 2021 Sep 8;13(1):65. doi: 10.1186/s13321-021-00547-7.
10
DeepPocket: Ligand Binding Site Detection and Segmentation using 3D Convolutional Neural Networks.DeepPocket:使用 3D 卷积神经网络进行配体结合位点检测和分割。
J Chem Inf Model. 2022 Nov 14;62(21):5069-5079. doi: 10.1021/acs.jcim.1c00799. Epub 2021 Aug 10.