BIPSPI+：挖掘特定类型的蛋白质复合物数据集以提高蛋白质结合位点预测。

BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction.

机构信息

Biocomputing Unit, National Center for Biotechnology (CSIC), Darwin 3, Campus Univ. Autónoma de Madrid, Cantoblanco, 28049 Madrid, Spain; Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 29 St Giles' Oxford OX1 3LB, UK.

Biocomputing Unit, National Center for Biotechnology (CSIC), Darwin 3, Campus Univ. Autónoma de Madrid, Cantoblanco, 28049 Madrid, Spain.

出版信息

J Mol Biol. 2022 Jun 15;434(11):167556. doi: 10.1016/j.jmb.2022.167556. Epub 2022 Mar 21.

DOI:10.1016/j.jmb.2022.167556

PMID:35662471

Abstract

Computational approaches for predicting protein-protein interfaces are extremely useful for understanding and modelling the quaternary structure of protein assemblies. In particular, partner-specific binding site prediction methods allow delineating the specific residues that compose the interface of protein complexes. In recent years, new machine learning and other algorithmic approaches have been proposed to solve this problem. However, little effort has been made in finding better training datasets to improve the performance of these methods. With the aim of vindicating the importance of the training set compilation procedure, in this work we present BIPSPI+, a new version of our original server trained on carefully curated datasets that outperforms our original predictor. We show how prediction performance can be improved by selecting specific datasets that better describe particular types of protein interactions and interfaces (e.g. homo/hetero). In addition, our upgraded web server offers a new set of functionalities such as the sequence-structure prediction mode, hetero- or homo-complex specialization and the guided docking tool that allows to compute 3D quaternary structure poses using the predicted interfaces. BIPSPI+ is freely available at https://bipspi.cnb.csic.es.

摘要

用于预测蛋白质-蛋白质界面的计算方法对于理解和模拟蛋白质组装的四级结构非常有用。特别是，特定于伴侣的结合位点预测方法允许描绘组成蛋白质复合物界面的特定残基。近年来，已经提出了新的机器学习和其他算法方法来解决这个问题。然而，在寻找更好的训练数据集以提高这些方法的性能方面，几乎没有做出任何努力。为了证明训练集编译过程的重要性，在这项工作中，我们展示了 BIPSPI+，这是我们原始服务器的新版本，它是在经过精心整理的数据集上训练的，其性能优于我们的原始预测器。我们展示了如何通过选择更能描述特定类型的蛋白质相互作用和界面（例如同/异）的特定数据集来提高预测性能。此外，我们升级的网络服务器提供了一组新的功能，例如序列-结构预测模式、同/异复合物专业化和引导对接工具，可使用预测的界面计算 3D 四级结构构象。BIPSPI+可在 https://bipspi.cnb.csic.es 免费获得。