Carbery Anna, Buttenschoen Martin, Skyner Rachael, von Delft Frank, Deane Charlotte M
Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK.
Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK.
J Cheminform. 2024 Mar 14;16(1):32. doi: 10.1186/s13321-024-00821-4.
Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.
蛋白质-配体结合位点预测是理解新型目标蛋白质功能行为和潜在药物-靶点相互作用的有用工具。然而,大多数结合位点预测方法是通过提供结晶化的配体结合(全酶)结构作为输入来进行测试的。这种测试方式不足以了解在没有实验结构的新型蛋白质靶点上的性能。另一种选择是提供计算预测的蛋白质结构,但这并不常用。然而,由于所使用的训练数据,计算预测的蛋白质结构往往极其准确,并且常常偏向全酶构象。在本研究中,我们描述并对IF-SitePred进行基准测试,这是一种基于ESM-IF1蛋白质语言模型嵌入标记并结合点云注释和聚类的蛋白质-配体结合位点预测方法。我们表明,IF-SitePred不仅在预测实验结构上的结合位点时与现有方法具有竞争力,而且在通过分子动力学模拟低准确性的新型蛋白质替代物上表现更好。最后,如果生成预测蛋白质结构的集合,IF-SitePred的性能优于其他方法。