National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, 20894, MD, USA.
J Cheminform. 2012 Nov 7;4(1):28. doi: 10.1186/1758-2946-4-28.
To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules.
The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the "best-conformer-pair" approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive-non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs.
These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces "on average", although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way.
为了提高 PubChem 的实用性,PubChem3D 项目为 PubChem 化合物数据库中包含的小分子记录添加了计算得出的三维 (3-D) 描述,并提供了各种利用 3-D 分子相似性的搜索和分析工具。因此,有效利用 PubChem3D 资源需要了解分子间计算得出的 3-D 分子相似性得分的统计和生物学意义。
本研究调查了在一万个随机选择的经过生物测试的化合物(10-K 集)和给定生物测定中的非活性化合物(156-K 集)之间,每个化合物采用多个构象对时,对 3-D 相似性得分的影响。当采用“最佳构象对”方法时,即两个化合物之间的 3-D 相似性得分由化合物对中所有可能构象对的最大相似性得分表示,每个化合物采用十种不同构象时,10-K 集的平均 3-D 相似性得分相对于每个化合物采用单一构象时,分别增加了 0.11、0.09、0.15、0.16、0.07 和 0.18,对于 STST-opt、CTST-opt、ComboTST-opt、STCT-opt、CTCT-opt 和 ComboTCT-opt。有趣的是,最佳构象对方法还增加了给定测定中非活性-非活性(NN)对的平均 3-D 相似性得分,增加幅度与随机化合物对的相似,尽管一些测定显示出与随机化合物对的平均增加相比,NN 对 3-D 相似性得分的明显增加。
这些结果表明,在使用 3-D 分子相似性对 PubChem 生物测定数据进行分析时,每个化合物采用十种不同构象,预计不会“平均”增加非活性与随机和无活性空间的分离,尽管一些测定在每个化合物采用多种构象时显示出非活性和随机空间之间的明显分离。本研究是理解分子构象多样性对 3-D 分子相似性及其在 PubChem 中生物活性数据分析中的应用的关键下一步。本研究的结果可能有助于构建更有效地利用 PubChem 和其他分子库中化合物之间的 3-D 分子相似性的搜索和分析工具。