Suppr超能文献

利用基于序列的蛋白质语言模型绘制蛋白质结合位点空间图。

Mapping the space of protein binding sites with sequence-based protein language models.

作者信息

Oruç Tuğçe, Kadukova Maria, Davies Thomas G, Verdonk Marcel, Poelking Carl

机构信息

Astex Pharmaceuticals, Cambridge, United Kingdom.

出版信息

Bioinformatics. 2025 Jun 27;41(6). doi: 10.1093/bioinformatics/btaf284.

Abstract

MOTIVATION

Binding sites are the key interfaces that determine a protein's biological activity, and therefore common targets for therapeutic intervention. Techniques that help us detect, compare and contextualise binding sites are hence of immense interest to drug discovery.

RESULTS

Here we present an approach that integrates protein language models with a 3D tessellation technique to derive rich and versatile representations of binding sites that combine functional, structural and evolutionary information with unprecedented detail. We demonstrate that the associated similarity metrics induce meaningful pocket clusterings by balancing local structure against global sequence effects. The resulting embeddings are shown to simplify a variety of downstream tasks: they help organise the "pocketome" in a way that efficiently contextualises new binding sites, construct performant druggability models, and define challenging train-test splits for believable benchmarking of pocket-centric machine-learning models.

AVAILABILITY AND IMPLEMENTATION

A Python package that implements the EPoCS method is freely available at https://github.com/tugceoruc/epocs.

SUPPLEMENTARY INFORMATION

Supplementary data (extended figures and method details) are available at Bioinformatics online.

摘要

动机

结合位点是决定蛋白质生物活性的关键界面,因此是治疗干预的常见靶点。有助于我们检测、比较和分析结合位点背景的技术,对药物发现具有极大的吸引力。

结果

在此,我们提出一种方法,将蛋白质语言模型与三维镶嵌技术相结合,以获得丰富且通用的结合位点表示,该表示以前所未有的细节将功能、结构和进化信息结合在一起。我们证明,相关的相似性度量通过平衡局部结构与全局序列效应,诱导出有意义的口袋聚类。结果表明,所得的嵌入简化了各种下游任务:它们有助于以一种有效地分析新结合位点背景的方式组织“口袋组”,构建高性能的成药模型,并为以口袋为中心的机器学习模型的可信基准测试定义具有挑战性的训练-测试分割。

可用性和实现

实现EPoCS方法的Python包可在https://github.com/tugceoruc/epocs上免费获取。

补充信息

补充数据(扩展图和方法细节)可在《生物信息学》在线版上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/901a/12208174/ef909ebf5e0f/btaf284f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验