利用基于序列的蛋白质语言模型绘制蛋白质结合位点空间图。

Mapping the space of protein binding sites with sequence-based protein language models.

作者信息

Oruç Tuğçe, Kadukova Maria, Davies Thomas G, Verdonk Marcel, Poelking Carl

机构信息

Astex Pharmaceuticals, Cambridge, United Kingdom.

出版信息

Bioinformatics. 2025 Jun 27;41(6). doi: 10.1093/bioinformatics/btaf284.

DOI:10.1093/bioinformatics/btaf284

PMID:40576205

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12208174/

Abstract

MOTIVATION

Binding sites are the key interfaces that determine a protein's biological activity, and therefore common targets for therapeutic intervention. Techniques that help us detect, compare and contextualise binding sites are hence of immense interest to drug discovery.

RESULTS

Here we present an approach that integrates protein language models with a 3D tessellation technique to derive rich and versatile representations of binding sites that combine functional, structural and evolutionary information with unprecedented detail. We demonstrate that the associated similarity metrics induce meaningful pocket clusterings by balancing local structure against global sequence effects. The resulting embeddings are shown to simplify a variety of downstream tasks: they help organise the "pocketome" in a way that efficiently contextualises new binding sites, construct performant druggability models, and define challenging train-test splits for believable benchmarking of pocket-centric machine-learning models.

AVAILABILITY AND IMPLEMENTATION

A Python package that implements the EPoCS method is freely available at https://github.com/tugceoruc/epocs.

SUPPLEMENTARY INFORMATION

Supplementary data (extended figures and method details) are available at Bioinformatics online.

摘要

动机

结合位点是决定蛋白质生物活性的关键界面，因此是治疗干预的常见靶点。有助于我们检测、比较和分析结合位点背景的技术，对药物发现具有极大的吸引力。

结果

在此，我们提出一种方法，将蛋白质语言模型与三维镶嵌技术相结合，以获得丰富且通用的结合位点表示，该表示以前所未有的细节将功能、结构和进化信息结合在一起。我们证明，相关的相似性度量通过平衡局部结构与全局序列效应，诱导出有意义的口袋聚类。结果表明，所得的嵌入简化了各种下游任务：它们有助于以一种有效地分析新结合位点背景的方式组织“口袋组”，构建高性能的成药模型，并为以口袋为中心的机器学习模型的可信基准测试定义具有挑战性的训练-测试分割。

可用性和实现

实现EPoCS方法的Python包可在https://github.com/tugceoruc/epocs上免费获取。

补充信息

补充数据（扩展图和方法细节）可在《生物信息学》在线版上获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用基于序列的蛋白质语言模型绘制蛋白质结合位点空间图。

Mapping the space of protein binding sites with sequence-based protein language models.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

本文引用的文献

利用基于序列的蛋白质语言模型绘制蛋白质结合位点空间图。

Mapping the space of protein binding sites with sequence-based protein language models.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

本文引用的文献