National Center for Biotechnology Information National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA.
J Cheminform. 2011 Jan 27;3(1):4. doi: 10.1186/1758-2946-3-4.
PubChem, an open archive for the biological activities of small molecules, provides search and analysis tools to assist users in locating desired information. Many of these tools focus on the notion of chemical structure similarity at some level. PubChem3D enables similarity of chemical structure 3-D conformers to augment the existing similarity of 2-D chemical structure graphs. It is also desirable to relate theoretical 3-D descriptions of chemical structures to experimental biological activity. As such, it is important to be assured that the theoretical conformer models can reproduce experimentally determined bioactive conformations. In the present study, we investigate the effects of three primary conformer generation parameters (the fragment sampling rate, the energy window size, and force field variant) upon the accuracy of theoretical conformer models, and determined optimal settings for PubChem3D conformer model generation and conformer sampling.
Using the software package OMEGA from OpenEye Scientific Software, Inc., theoretical 3-D conformer models were generated for 25,972 small-molecule ligands, whose 3-D structures were experimentally determined. Different values for primary conformer generation parameters were systematically tested to find optimal settings. Employing a greater fragment sampling rate than the default did not improve the accuracy of the theoretical conformer model ensembles. An ever increasing energy window did increase the overall average accuracy, with rapid convergence observed at 10 kcal/mol and 15 kcal/mol for model building and torsion search, respectively; however, subsequent study showed that an energy threshold of 25 kcal/mol for torsion search resulted in slightly improved results for larger and more flexible structures. Exclusion of coulomb terms from the 94s variant of the Merck molecular force field (MMFF94s) in the torsion search stage gave more accurate conformer models at lower energy windows. Overall average accuracy of reproduction of bioactive conformations was remarkably linear with respect to both non-hydrogen atom count ("size") and effective rotor count ("flexibility"). Using these as independent variables, a regression equation was developed to predict the RMSD accuracy of a theoretical ensemble to reproduce bioactive conformations. The equation was modified to give a minimum RMSD conformer sampling value to help ensure that 90% of the sampled theoretical models should contain at least one conformer within the RMSD sampling value to a "bioactive" conformation.
Optimal parameters for conformer generation using OMEGA were explored and determined. An equation was developed that provides an RMSD sampling value to use that is based on the relative accuracy to reproduce bioactive conformations. The optimal conformer generation parameters and RMSD sampling values determined are used by the PubChem3D project to generate theoretical conformer models.
PubChem 是一个小分子生物活性的开放档案库,提供搜索和分析工具,帮助用户定位所需信息。这些工具中的许多都侧重于某种程度上的化学结构相似性概念。PubChem3D 可以对化学结构的 3D 构象进行相似性比较,以增强现有的 2D 化学结构图的相似性。将化学结构的理论 3D 描述与实验生物活性联系起来也是很有必要的。因此,必须确保理论构象模型能够再现实验确定的生物活性构象。在本研究中,我们研究了三个主要构象生成参数(片段采样率、能量窗口大小和力场变体)对理论构象模型准确性的影响,并确定了 PubChem3D 构象模型生成和构象采样的最佳设置。
使用 OpenEye Scientific Software, Inc. 的软件包 OMEGA,为 25972 种小分子配体生成了理论 3D 构象模型,这些配体的 3D 结构是通过实验确定的。系统测试了不同的主要构象生成参数值,以找到最佳设置。采用比默认值更大的片段采样率并不能提高理论构象模型集的准确性。不断增加的能量窗口确实提高了整体平均准确性,分别在模型构建和扭转搜索时以 10 kcal/mol 和 15 kcal/mol 达到快速收敛;然而,随后的研究表明,在扭转搜索时将能量阈值设置为 25 kcal/mol 对于更大和更灵活的结构会产生稍微改进的结果。在扭转搜索阶段排除 Merck 分子力场(MMFF94s)94s 变体中的库仑项,可以在较低的能量窗口下获得更准确的构象模型。生物活性构象的再现整体平均准确性与非氢原子数(“大小”)和有效转子数(“灵活性”)呈显著线性关系。使用这些作为独立变量,开发了一个回归方程来预测理论集合再现生物活性构象的 RMSD 准确性。该方程经过修改后,可以给出最小 RMSD 构象采样值,以帮助确保 90%的采样理论模型中至少有一个构象在 RMSD 采样值内到“生物活性”构象。
探索并确定了使用 OMEGA 进行构象生成的最佳参数。开发了一个方程,该方程提供了一个 RMSD 采样值,该值是基于再现生物活性构象的相对准确性。确定的最佳构象生成参数和 RMSD 采样值被 PubChem3D 项目用于生成理论构象模型。