Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK.
J Chem Inf Model. 2012 May 25;52(5):1146-58. doi: 10.1021/ci2004658. Epub 2012 Apr 19.
Conformer generation has important implications in cheminformatics, particularly in computational drug discovery where the quality of conformer generation software may affect the outcome of a virtual screening exercise. We examine the performance of four freely available small molecule conformer generation tools (Balloon, Confab, Frog2, and RDKit) alongside a commercial tool (MOE). The aim of this study is 3-fold: (i) to identify which tools most accurately reproduce experimentally determined structures; (ii) to examine the diversity of the generated conformational set; and (iii) to benchmark the computational time expended. These aspects were tested using a set of 708 drug-like molecules assembled from the OMEGA validation set and the Astex Diverse Set. These molecules have varying physicochemical properties and at least one known X-ray crystal structure. We found that RDKit and Confab are statistically better than other methods at generating low rmsd conformers to the known structure. RDKit is particularly suited for less flexible molecules while Confab, with its systematic approach, is able to generate conformers which are geometrically closer to the experimentally determined structure for molecules with a large number of rotatable bonds (≥10). In our tests RDKit also resulted as the second fastest method after Frog2. In order to enhance the performance of RDKit, we developed a postprocessing algorithm to build a diverse and representative set of conformers which also contains a close conformer to the known structure. Our analysis indicates that, with postprocessing, RDKit is a valid free alternative to commercial, proprietary software.
构象生成在化学信息学中具有重要意义,特别是在计算药物发现中,构象生成软件的质量可能会影响虚拟筛选实验的结果。我们检查了四个免费的小分子构象生成工具(Balloon、Confab、Frog2 和 RDKit)以及一个商业工具(MOE)的性能。本研究的目的有三:(i)确定哪些工具最能准确地再现实验确定的结构;(ii)检查生成构象集的多样性;(iii)对计算时间进行基准测试。这些方面使用一组由 OMEGA 验证集和 Astex 多样化集组装的 708 个药物样分子进行了测试。这些分子具有不同的物理化学性质,并且至少有一个已知的 X 射线晶体结构。我们发现,RDKit 和 Confab 在生成与已知结构低 RMSD 构象方面比其他方法具有统计学上的优势。RDKit 特别适合于较不灵活的分子,而 Confab 则通过系统的方法,能够为具有大量可旋转键(≥10)的分子生成更接近实验确定结构的构象。在我们的测试中,RDKit 也是仅次于 Frog2 的第二快方法。为了提高 RDKit 的性能,我们开发了一种后处理算法,以构建一个多样且有代表性的构象集,其中还包含一个与已知结构接近的构象。我们的分析表明,经过后处理,RDKit 是商业专有软件的有效免费替代品。