蛋白质的学习表示可用于在实验确定的和预测的蛋白质结构上准确预测小分子结合位点。

Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures.

作者信息

Carbery Anna, Buttenschoen Martin, Skyner Rachael, von Delft Frank, Deane Charlotte M

机构信息

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK.

Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK.

出版信息

J Cheminform. 2024 Mar 14;16(1):32. doi: 10.1186/s13321-024-00821-4.

DOI:10.1186/s13321-024-00821-4

PMID:38486231

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10941399/

Abstract

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

摘要

蛋白质-配体结合位点预测是理解新型目标蛋白质功能行为和潜在药物-靶点相互作用的有用工具。然而，大多数结合位点预测方法是通过提供结晶化的配体结合（全酶）结构作为输入来进行测试的。这种测试方式不足以了解在没有实验结构的新型蛋白质靶点上的性能。另一种选择是提供计算预测的蛋白质结构，但这并不常用。然而，由于所使用的训练数据，计算预测的蛋白质结构往往极其准确，并且常常偏向全酶构象。在本研究中，我们描述并对IF-SitePred进行基准测试，这是一种基于ESM-IF1蛋白质语言模型嵌入标记并结合点云注释和聚类的蛋白质-配体结合位点预测方法。我们表明，IF-SitePred不仅在预测实验结构上的结合位点时与现有方法具有竞争力，而且在通过分子动力学模拟低准确性的新型蛋白质替代物上表现更好。最后，如果生成预测蛋白质结构的集合，IF-SitePred的性能优于其他方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

蛋白质的学习表示可用于在实验确定的和预测的蛋白质结构上准确预测小分子结合位点。

Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

蛋白质的学习表示可用于在实验确定的和预测的蛋白质结构上准确预测小分子结合位点。

Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献