School of Computer and Information Technology, Xinyang Normal University.
Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
Brief Bioinform. 2018 Sep 28;19(5):821-837. doi: 10.1093/bib/bbx022.
Understanding of molecular mechanisms that govern protein-protein interactions and accurate modeling of protein-protein docking rely on accurate identification and prediction of protein-binding partners and protein-binding residues. We review over 40 methods that predict protein-protein interactions from protein sequences including methods that predict interacting protein pairs, protein-binding residues for a pair of interacting sequences and protein-binding residues in a single protein chain. We focus on the latter methods that provide residue-level annotations and that can be broadly applied to all protein sequences. We compare their architectures, inputs and outputs, and we discuss aspects related to their assessment and availability. We also perform first-of-its-kind comprehensive empirical comparison of representative predictors of protein-binding residues using a novel and high-quality benchmark data set. We show that the selected predictors accurately discriminate protein-binding and non-binding residues and that newer methods outperform older designs. However, these methods are unable to accurately separate residues that bind other molecules, such as DNA, RNA and small ligands, from the protein-binding residues. This cross-prediction, defined as the incorrect prediction of nucleic-acid- and small-ligand-binding residues as protein binding, is substantial for all evaluated methods and is not driven by the proximity to the native protein-binding residues. We discuss reasons for this drawback and we offer several recommendations. In particular, we postulate the need for a new generation of more accurate predictors and data sets, inclusion of a comprehensive assessment of the cross-predictions in future studies and higher standards of availability of the published methods.
我们综述了 40 多种从蛋白质序列预测蛋白质-蛋白质相互作用的方法,包括预测相互作用的蛋白质对、相互作用序列的蛋白质结合残基以及单个蛋白质链中的蛋白质结合残基的方法。我们专注于提供残基级注释并且可以广泛应用于所有蛋白质序列的后一种方法。我们比较了它们的架构、输入和输出,并讨论了与其评估和可用性相关的方面。我们还首次使用新颖且高质量的基准数据集对蛋白质结合残基的代表性预测因子进行了全面的实证比较。我们表明,所选的预测因子能够准确区分蛋白质结合和非结合残基,并且较新的方法优于较旧的设计。然而,这些方法无法准确区分与其他分子(如 DNA、RNA 和小分子配体)结合的残基与蛋白质结合残基。这种交叉预测,定义为将核酸和小分子配体结合残基错误地预测为蛋白质结合,对于所有评估的方法都是重要的,并且不是由与天然蛋白质结合残基的接近程度驱动的。我们讨论了这种缺点的原因,并提出了一些建议。特别是,我们假设需要新一代更准确的预测因子和数据集,在未来的研究中全面评估交叉预测,并提高已发表方法的可用性标准。