Department of Biochemistry and Immunology.
Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i726-i734. doi: 10.1093/bioinformatics/btaa805.
The discovery of protein-ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein-ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10-20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2-5 h on average.
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary data are available at Bioinformatics online.
发现蛋白质-配体结合位点是阐明蛋白质功能和研究新功能作用的重要步骤。实验检测蛋白质-配体结合位点既耗时又昂贵。因此,提出了多种计算方法来检测和预测结合位点,因为它们具有可扩展性、快速和低成本的特点。
我们提出了基于图的残基邻域策略来预测结合位点(GRaSP),这是一种新的基于残基的可扩展方法,用于预测配体结合位点残基。它基于一种监督学习策略,将残基环境建模为原子级别的图。结果表明,与文献中描述的方法相比,GRaSP 做出了兼容或更好的预测。GRaSP 优于包括被认为是最先进的方法在内的六种其他基于残基的方法。此外,我们的方法在 CAMEO 独立评估中比方法的表现更好。与五种最先进的口袋中心方法相比,GRaSP 排名第二,这是一个重要的结果,因为它不是为预测口袋而设计的。最后,我们的方法被证明是可扩展的,因为它平均需要 10-20 秒来预测蛋白质复合物的结合位点,而最先进的基于残基的方法平均需要 2-5 小时。
源代码和数据集可在 https://github.com/charles-abreu/GRaSP 上获得。
补充数据可在生物信息学在线获得。