Quignot Chloé, Granger Pierre, Chacón Pablo, Guerois Raphael, Andreani Jessica
Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France.
Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry C.S.I.C Serrano 119, 28006 Madrid, Spain.
Bioinformatics. 2021 Oct 11;37(19):3175-3181. doi: 10.1093/bioinformatics/btab254.
The crucial role of protein interactions and the difficulty in characterizing them experimentally strongly motivates the development of computational approaches for structural prediction. Even when protein-protein docking samples correct models, current scoring functions struggle to discriminate them from incorrect decoys. The previous incorporation of conservation and coevolution information has shown promise for improving protein-protein scoring. Here, we present a novel strategy to integrate atomic-level evolutionary information into different types of scoring functions to improve their docking discrimination.
We applied this general strategy to our residue-level statistical potential from InterEvScore and to two atomic-level scores, SOAP-PP and Rosetta interface score (ISC). Including evolutionary information from as few as 10 homologous sequences improves the top 10 success rates of individual atomic-level scores SOAP-PP and Rosetta ISC by 6 and 13.5 percentage points, respectively, on a large benchmark of 752 docking cases. The best individual homology-enriched score reaches a top 10 success rate of 34.4%. A consensus approach based on the complementarity between different homology-enriched scores further increases the top 10 success rate to 40%.
All data used for benchmarking and scoring results, as well as a Singularity container of the pipeline, are available at http://biodev.cea.fr/interevol/interevdata/.
Supplementary data are available at Bioinformatics online.
蛋白质相互作用的关键作用以及通过实验表征它们的困难,有力地推动了用于结构预测的计算方法的发展。即使蛋白质 - 蛋白质对接生成了正确的模型,当前的评分函数也难以将它们与错误的诱饵区分开来。先前纳入保守性和共进化信息已显示出改善蛋白质 - 蛋白质评分的前景。在这里,我们提出了一种新颖的策略,将原子水平的进化信息整合到不同类型的评分函数中,以提高它们的对接辨别能力。
我们将这种通用策略应用于来自InterEvScore的残基水平统计势以及两个原子水平评分,即SOAP - PP和Rosetta界面评分(ISC)。在752个对接案例的大型基准测试中,仅包含10个同源序列的进化信息,就分别将单个原子水平评分SOAP - PP和Rosetta ISC的前10成功率提高了6个和13.5个百分点。最佳的单个富集同源性评分达到了34.4%的前10成功率。基于不同富集同源性评分之间互补性的共识方法进一步将前10成功率提高到了40%。
用于基准测试和评分结果的所有数据,以及该流程的Singularity容器,可在http://biodev.cea.fr/interevol/interevdata/获取。
补充数据可在《生物信息学》在线获取。