Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, UK.
J Chem Inf Model. 2012 May 25;52(5):1262-74. doi: 10.1021/ci2005934. Epub 2012 Apr 17.
A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500,000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols.
基于结构的虚拟筛选应用中的一个主要问题是选择单个甚至多个蛋白质结构用于虚拟筛选过程。在虚拟筛选实验之前,尚不清楚哪种蛋白质结构的性能最好。我们研究了组合对接的性能,其功能是作为组合大小的函数,针对 8 个具有药物应用的靶点。从单个蛋白质结构对接结果开始,为每个组合大小生成了多达 500,000 个蛋白质结构组合,并且为每个组合生成了构象预测和虚拟筛选结果。将单个蛋白质结构的结果与多个蛋白质结构的结果进行比较表明,当查看所有单个蛋白质结构中最差和平均的性能与大小为 2 或更大的所有蛋白质组合中最差和平均的性能时,性能有所提高。我们确定了影响组合对接性能的几个关键因素,包括对接算法的采样准确性、评分函数的选择以及数据库配体与组合中配体结合蛋白质结构的共结晶配体的相似性。由于这些因素,最佳组合的前瞻性选择是一项具有挑战性的任务,通过重新评估已发表的组合选择协议可以看出这一点。