Sotudian Shahabeddin, Desta Israel T, Hashemi Nasser, Zarbafian Shahrooz, Kozakov Dima, Vakili Pirooz, Vajda Sandor, Paschalidis Ioannis Ch
Division of Systems Engineering, Boston University, Boston, USA.
Department of Biomedical Engineering, Boston University.
Comput Struct Biotechnol J. 2021 Apr 20;19:2269-2278. doi: 10.1016/j.csbj.2021.04.028. eCollection 2021.
We develop a method to rank clusters of similar protein complex conformations generated by an underlying docking program. The method leverages robust regression to predict the relative quality difference between any pair or clusters and combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show improvement by 24-100% in ranking acceptable or better quality clusters first, and by 15-100% in ranking medium or better quality clusters first. We compare the RRPCC-ClusPro combination to a number of alternatives, and show that very different machine learning approaches to scoring docked structures yield similar success rates. Finally, we discuss the current limitations on sampling and scoring, looking ahead to further improvements. Interestingly, some features important for improved scoring are internal energy terms that occur only due to the local energy minimization applied in the refinement stage following rigid body docking.
我们开发了一种方法,用于对由基础对接程序生成的相似蛋白质复合物构象簇进行排序。该方法利用稳健回归来预测任意两个簇之间的相对质量差异,并结合这些成对评估形成一个从高质量到低质量的簇排名列表。我们将RRPCC应用于自动对接服务器ClusPro生成的簇,根据训练/验证策略,我们发现在首先对可接受或更高质量的簇进行排名时提高了24 - 100%,在首先对中等或更高质量的簇进行排名时提高了15 - 100%。我们将RRPCC - ClusPro组合与许多其他方法进行比较,结果表明,用于对接结构评分的非常不同的机器学习方法产生了相似的成功率。最后,我们讨论了当前在采样和评分方面的局限性,并展望了进一步的改进。有趣的是,一些对改进评分很重要的特征是仅在刚体对接后的细化阶段应用局部能量最小化时出现的内部能量项。