Luo Qiyao, Zhao Liang, Hu Jianxing, Jin Hongwei, Liu Zhenming, Zhang Liangren
State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, P. R. China.
PLoS One. 2017 Feb 14;12(2):e0171433. doi: 10.1371/journal.pone.0171433. eCollection 2017.
Target fishing often relies on the use of reverse docking to identify potential target proteins of ligands from protein database. The limitation of reverse docking is the accuracy of current scoring funtions used to distinguish true target from non-target proteins. Many contemporary scoring functions are designed for the virtual screening of small molecules without special optimization for reverse docking, which would be easily influenced by the properties of protein pockets, resulting in scoring bias to the proteins with certain properties. This bias would cause lots of false positives in reverse docking, interferring the identification of true targets. In this paper, we have conducted a large-scale reverse docking (5000 molecules to 100 proteins) to study the scoring bias in reverse docking by DOCK, Glide, and AutoDock Vina. And we found that there were actually some frequency hits, namely interference proteins in all three docking procedures. After analyzing the differences of pocket properties between these interference proteins and the others, we speculated that the interference proteins have larger contact area (related to the size and shape of protein pockets) with ligands (for all three docking programs) or higher hydrophobicity (for Glide), which could be the causes of scoring bias. Then we applied the score normalization method to eliminate this scoring bias, which was effective to make docking score more balanced between different proteins in the reverse docking of benchmark dataset. Later, the Astex Diver Set was utilized to validate the effect of score normalization on actual cases of reverse docking, showing that the accuracy of target prediction significantly increased by 21.5% in the reverse docking by Glide after score normalization, though there was no obvious change in the reverse docking by DOCK and AutoDock Vina. Our results demonstrate the effectiveness of score normalization to eliminate the scoring bias and improve the accuracy of target prediction in reverse docking. Moreover, the properties of protein pockets causing scoring bias to certain proteins we found here can provide the theory basis to further optimize the scoring functions of docking programs for future research.
靶向垂钓通常依赖于使用反向对接从蛋白质数据库中识别配体的潜在靶蛋白。反向对接的局限性在于当前用于区分真实靶蛋白和非靶蛋白的评分函数的准确性。许多当代评分函数是为小分子虚拟筛选设计的,没有针对反向对接进行特殊优化,这很容易受到蛋白质口袋性质的影响,导致对具有某些性质的蛋白质产生评分偏差。这种偏差会在反向对接中导致大量假阳性,干扰真实靶标的识别。在本文中,我们进行了大规模的反向对接(5000个分子对100种蛋白质),以研究DOCK、Glide和AutoDock Vina在反向对接中的评分偏差。我们发现,在所有三种对接程序中实际上都存在一些频繁命中的情况,即干扰蛋白。在分析了这些干扰蛋白与其他蛋白之间口袋性质的差异后,我们推测干扰蛋白与配体(对于所有三种对接程序)具有更大的接触面积(与蛋白质口袋的大小和形状有关)或更高的疏水性(对于Glide),这可能是评分偏差的原因。然后我们应用评分归一化方法来消除这种评分偏差,这有效地使基准数据集反向对接中不同蛋白质之间的对接分数更加平衡。后来,利用阿斯泰克斯多样化集来验证评分归一化在反向对接实际案例中的效果,结果表明,评分归一化后,Glide反向对接中靶标预测的准确率显著提高了21.5%,尽管DOCK和AutoDock Vina的反向对接没有明显变化。我们的结果证明了评分归一化在消除评分偏差和提高反向对接中靶标预测准确率方面的有效性。此外,我们在此发现导致对某些蛋白质产生评分偏差的蛋白质口袋性质可为未来研究进一步优化对接程序的评分函数提供理论依据。