National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA.
J Cheminform. 2013 Jan 7;5(1):1. doi: 10.1186/1758-2946-5-1.
PubChem is a free and publicly available resource containing substance descriptions and their associated biological activity information. PubChem3D is an extension to PubChem containing computationally-derived three-dimensional (3-D) structures of small molecules. All the tools and services that are a part of PubChem3D rely upon the quality of the 3-D conformer models. Construction of the conformer models currently available in PubChem3D involves a clustering stage to sample the conformational space spanned by the molecule. While this stage allows one to downsize the conformer models to more manageable size, it may result in a loss of the ability to reproduce experimentally determined "bioactive" conformations, for example, found for PDB ligands. This study examines the extent of this accuracy loss and considers its effect on the 3-D similarity analysis of molecules.
The conformer models consisting of up to 100,000 conformers per compound were generated for 47,123 small molecules whose structures were experimentally determined, and the conformers in each conformer model were clustered to reduce the size of the conformer model to a maximum of 500 conformers per molecule. The accuracy of the conformer models before and after clustering was evaluated using five different measures: root-mean-square distance (RMSD), shape-optimized shape-Tanimoto (STST-opt) and combo-Tanimoto (ComboTST-opt), and color-optimized color-Tanimoto (CTCT-opt) and combo-Tanimoto (ComboTCT-opt). On average, the effect of clustering decreased the conformer model accuracy, increasing the conformer ensemble's RMSD to the bioactive conformer (by 0.18 ± 0.12 Å), and decreasing the STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt scores (by 0.04 ± 0.03, 0.16 ± 0.09, 0.09 ± 0.05, and 0.15 ± 0.09, respectively).
This study shows the RMSD accuracy performance of the PubChem3D conformer models is operating as designed. In addition, the effect of PubChem3D sampling on 3-D similarity measures shows that there is a linear degradation of average accuracy with respect to molecular size and flexibility. Generally speaking, one can likely expect the worst-case minimum accuracy of 90% or more of the PubChem3D ensembles to be 0.75, 1.09, 0.43, and 1.13, in terms of STST-opt, ComboTST-opt, CTCT-opt, and ComboTCT-opt, respectively. This expected accuracy improves linearly as the molecule becomes smaller or less flexible.
PubChem 是一个免费的、公开的资源,包含物质描述及其相关生物活性信息。PubChem3D 是 PubChem 的一个扩展,包含了小分子的计算三维(3-D)结构。PubChem3D 中的所有工具和服务都依赖于 3-D 构象模型的质量。目前在 PubChem3D 中构建构象模型涉及到一个聚类阶段,用于采样分子所跨越的构象空间。虽然这个阶段可以将构象模型缩小到更易于管理的大小,但它可能会导致无法重现实验确定的“生物活性”构象的能力,例如pdb 配体中的构象。本研究检查了这种准确性损失的程度,并考虑了其对分子 3-D 相似性分析的影响。
为 47123 个实验确定结构的小分子生成了每个化合物最多包含 100000 个构象的构象模型,并将每个构象模型中的构象进行聚类,将构象模型的大小最大减少到每个分子 500 个构象。使用五种不同的度量标准评估聚类前后构象模型的准确性:均方根距离(RMSD)、形状优化形状-塔尼托(STST-opt)和组合-塔尼托(ComboTST-opt),以及颜色优化颜色-塔尼托(CTCT-opt)和组合-颜色-塔尼托(ComboTCT-opt)。平均而言,聚类的影响降低了构象模型的准确性,使构象集合的 RMSD 增加到生物活性构象(增加 0.18±0.12 Å),并降低了 STST-opt、ComboTST-opt、CTCT-opt 和 ComboTCT-opt 的分数(分别降低 0.04±0.03、0.16±0.09、0.09±0.05 和 0.15±0.09)。
本研究表明,PubChem3D 构象模型的 RMSD 准确性性能按设计运行。此外,PubChem3D 采样对 3-D 相似性度量的影响表明,平均准确性随分子大小和灵活性的线性下降。一般来说,可以预期 PubChem3D 集合的最差情况下的最小准确性为 90%或更高,在 STST-opt、ComboTST-opt、CTCT-opt 和 ComboTCT-opt 方面分别为 0.75、1.09、0.43 和 1.13。随着分子变得更小或更灵活,这种预期的准确性会线性提高。