Friedrich Nils-Ole, de Bruyn Kops Christina, Flachsenberg Florian, Sommer Kai, Rarey Matthias, Kirchmair Johannes
Center for Bioinformatics, Universität Hamburg , Bundesstr. 43, Hamburg 20146, Germany.
J Chem Inf Model. 2017 Nov 27;57(11):2719-2728. doi: 10.1021/acs.jcim.7b00505. Epub 2017 Oct 18.
We assess and compare the performance of eight commercial conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extracted from the PDB. Differences in the performance of commercial algorithms are much smaller than those observed for free algorithms in our previous study (J. Chem. Inf.
2017, 57, 529-539). For commercial algorithms, the median minimum root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a maximum of 250 conformers are between 0.46 and 0.61 Å. Commercial conformer ensemble generators are characterized by their high robustness, with at least 99% of all input molecules successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked commercial algorithms. Based on a statistical analysis, we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
我们评估并比较了八种商业构象集合生成器(ConfGen、ConfGenX、cxcalc、iCon、MOE LowModeMD、MOE Stochastic、MOE Conformation Import和OMEGA)以及一种领先的免费算法,即RDKit中实现的距离几何算法的性能。该比较研究基于新版的白金多样数据集,这是一个从蛋白质数据银行(PDB)中提取的包含2859个与蛋白质结合的配体构象的高质量基准数据集。商业算法性能的差异远小于我们之前研究(《化学信息与建模杂志》:2017年,57卷,529 - 539页)中观察到的免费算法的差异。对于商业算法,在与蛋白质结合的配体构象和最多250个构象的集合之间测量的中位最小均方根偏差在0.46至0.61埃之间。商业构象集合生成器的特点是具有高稳健性,所有输入分子中至少99%被成功处理,并且在其输出构象中几乎检测不到或甚至没有明显的几何误差。RDKit距离几何算法(启用最小化)似乎是一个不错的免费替代方案,因为其性能与排名中等的商业算法相当。基于统计分析,我们详细阐述了在不同应用场景中使用哪些算法以及如何对它们进行参数设置以获得最佳性能。