• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PubChem3D:使用分子形状四极矩进行形状兼容性过滤。

PubChem3D: Shape compatibility filtering using molecular shape quadrupoles.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

J Cheminform. 2011 Jul 20;3:25. doi: 10.1186/1758-2946-3-25.

DOI:10.1186/1758-2946-3-25
PMID:21774809
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3158422/
Abstract

BACKGROUND

PubChem provides a 3-D neighboring relationship, which involves finding the maximal shape overlap between two static compound 3-D conformations, a computationally intensive step. It is highly desirable to avoid this overlap computation, especially if it can be determined with certainty that a conformer pair cannot meet the criteria to be a 3-D neighbor. As such, PubChem employs a series of pre-filters, based on the concept of volume, to remove approximately 65% of all conformer neighbor pairs prior to shape overlap optimization. Given that molecular volume, a somewhat vague concept, is rather effective, it leads one to wonder: can the existing PubChem 3-D neighboring relationship, which consists of billions of shape similar conformer pairs from tens of millions of unique small molecules, be used to identify additional shape descriptor relationships? Or, put more specifically, can one place an upper bound on shape similarity using other "fuzzy" shape-like concepts like length, width, and height?

RESULTS

Using a basis set of 4.18 billion 3-D neighbor pairs identified from single conformer per compound neighboring of 17.1 million molecules, shape descriptors were computed for all conformers. These steric shape descriptors included several forms of molecular volume and shape quadrupoles, which essentially embody the length, width, and height of a conformer. For a given 3-D neighbor conformer pair, the volume and each quadrupole component (Qx, Qy, and Qz) were binned and their frequency of occurrence was examined. Per molecular volume type, this effectively produced three different maps, one per quadrupole component (Qx, Qy, and Qz), of allowed values for the similarity metric, shape Tanimoto (ST) ≥ 0.8.The efficiency of these relationships (in terms of true positive, true negative, false positive and false negative) as a function of ST threshold was determined in a test run of 13.2 billion conformer pairs not previously considered by the 3-D neighbor set. At an ST ≥ 0.8, a filtering efficiency of 40.4% of true negatives was achieved with only 32 false negatives out of 24 million true positives, when applying the separate Qx, Qy, and Qz maps in a series (Qxyz). This efficiency increased linearly as a function of ST threshold in the range 0.8-0.99. The Qx filter was consistently the most efficient followed by Qy and then by Qz. Use of a monopole volume showed the best overall performance, followed by the self-overlap volume and then by the analytic volume.Application of the monopole-based Qxyz filter in a "real world" test of 3-D neighboring of 4,218 chemicals of biomedical interest against 26.1 million molecules in PubChem reduced the total CPU cost of neighboring by between 24-38% and, if used as the initial filter, removed from consideration 48.3% of all conformer pairs at almost negligible computational overhead.

CONCLUSION

Basic shape descriptors, such as those embodied by size, length, width, and height, can be highly effective in identifying shape incompatible compound conformer pairs. When performing a 3-D search using a shape similarity cut-off, computation can be avoided by identifying conformer pairs that cannot meet the result criteria. Applying this methodology as a filter for PubChem 3-D neighboring computation, an improvement of 31% was realized, increasing the average conformer pair throughput from 154,000 to 202,000 per second per CPU core.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/944319e599c3/1758-2946-3-25-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/0b2cbe9c9935/1758-2946-3-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/b5a5fe8eee4e/1758-2946-3-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/23e8ae690abc/1758-2946-3-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/5b836a4a4634/1758-2946-3-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9a7675c6feb3/1758-2946-3-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/0594afaabf05/1758-2946-3-25-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/57cf9268a40d/1758-2946-3-25-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9ae48c480b28/1758-2946-3-25-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/fa6b1269b72c/1758-2946-3-25-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/cd8e0838654b/1758-2946-3-25-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9e8d79c60986/1758-2946-3-25-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/944319e599c3/1758-2946-3-25-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/0b2cbe9c9935/1758-2946-3-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/b5a5fe8eee4e/1758-2946-3-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/23e8ae690abc/1758-2946-3-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/5b836a4a4634/1758-2946-3-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9a7675c6feb3/1758-2946-3-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/0594afaabf05/1758-2946-3-25-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/57cf9268a40d/1758-2946-3-25-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9ae48c480b28/1758-2946-3-25-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/fa6b1269b72c/1758-2946-3-25-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/cd8e0838654b/1758-2946-3-25-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/9e8d79c60986/1758-2946-3-25-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c8/3158422/944319e599c3/1758-2946-3-25-12.jpg
摘要

背景

PubChem 提供了一种 3-D 邻近关系,它涉及到找到两个静态化合物 3-D 构象之间的最大形状重叠,这是一个计算密集的步骤。如果可以确定构象对不可能满足成为 3-D 邻居的标准,那么非常希望避免这种重叠计算。因此,PubChem 采用了一系列基于体积概念的预过滤器,在进行形状重叠优化之前,大约去除所有构象邻居对的 65%。由于分子体积是一个相当模糊的概念,但非常有效,这让人不禁想知道:现有的 PubChem 3-D 邻近关系,它由数十亿个来自数千万个独特小分子的形状相似构象对组成,是否可以用于识别其他形状描述符关系?或者,更具体地说,是否可以使用其他“模糊”形状概念,如长度、宽度和高度,为形状相似性设置一个上限?

结果

使用从 1710 万个分子的每个化合物相邻的单个构象中确定的 41.8 亿个 3-D 邻居对的基础集,为所有构象计算了形状描述符。这些立体形状描述符包括几种形式的分子体积和形状四极矩,它们本质上体现了构象的长度、宽度和高度。对于给定的 3-D 邻居构象对,将体积和每个四极矩分量(Qx、Qy 和 Qz)进行分组,并检查它们的出现频率。对于每种分子体积类型,这实际上产生了三个不同的映射,每个映射对应一个四极矩分量(Qx、Qy 和 Qz),用于相似性度量形状 Tanimoto(ST)≥0.8 的允许值。在对 132 亿个构象对进行测试运行时,确定了这些关系的效率(以真阳性、真阴性、假阳性和假阴性为指标),这些构象对之前未被 3-D 邻居集考虑。在 ST≥0.8 时,当在一个系列中应用单独的 Qx、Qy 和 Qz 映射(Qxyz)时,通过过滤可以实现 40.4%的真阴性过滤效率,只有 32 个假阴性中的 2400 万个真阳性,这随着 ST 阈值在 0.8-0.99 范围内线性增加。Qx 过滤器始终是最有效的,其次是 Qy,然后是 Qz。使用单极矩体积显示出最佳的整体性能,其次是自重叠体积,然后是分析体积。在对 4218 种具有生物医学意义的化学物质与 PubChem 中的 2610 万个分子进行 3-D 邻近性的“真实世界”测试中应用基于单极矩的 Qxyz 过滤器,将邻近性的总 CPU 成本降低了 24-38%,如果用作初始过滤器,则几乎可以忽略不计的计算开销,可以排除所有构象对的 48.3%。

结论

基本的形状描述符,如大小、长度、宽度和高度所体现的形状描述符,可以非常有效地识别形状不兼容的化合物构象对。在使用形状相似性截止值进行 3-D 搜索时,可以通过识别无法满足结果标准的构象对来避免计算。将这种方法应用为 PubChem 3-D 邻近计算的过滤器,可以提高 31%的效率,将平均构象对吞吐量从每个 CPU 核心每秒 154000 提高到 202000。

相似文献

1
PubChem3D: Shape compatibility filtering using molecular shape quadrupoles.PubChem3D:使用分子形状四极矩进行形状兼容性过滤。
J Cheminform. 2011 Jul 20;3:25. doi: 10.1186/1758-2946-3-25.
2
PubChem3D: Similar conformers.PubChem3D:相似构象。
J Cheminform. 2011 May 9;3:13. doi: 10.1186/1758-2946-3-13.
3
Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis.化合物中多个构象对 3-D 相似性搜索和生物测定数据分析的影响。
J Cheminform. 2012 Nov 7;4(1):28. doi: 10.1186/1758-2946-4-28.
4
PubChem3D: Biologically relevant 3-D similarity.PubChem3D:具有生物学相关性的三维相似性。
J Cheminform. 2011 Jul 22;3(1):26. doi: 10.1186/1758-2946-3-26.
5
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets.相似化合物与相似构象异构体:PubChem二维和三维相邻集之间的互补性。
J Cheminform. 2016 Nov 4;8:62. doi: 10.1186/s13321-016-0163-1. eCollection 2016.
6
PubChem3D: conformer ensemble accuracy.PubChem3D:构象系综精度。
J Cheminform. 2013 Jan 7;5(1):1. doi: 10.1186/1758-2946-5-1.
7
PubChem3D: Diversity of shape.PubChem3D:形状的多样性。
J Cheminform. 2011 Mar 21;3:9. doi: 10.1186/1758-2946-3-9.
8
PubChem3D: a new resource for scientists.PubChem3D:科学家的新资源。
J Cheminform. 2011 Sep 20;3(1):32. doi: 10.1186/1758-2946-3-32.
9
PubChem3D: Conformer generation.PubChem3D:构象生成。
J Cheminform. 2011 Jan 27;3(1):4. doi: 10.1186/1758-2946-3-4.
10
Dynamic clustering threshold reduces conformer ensemble size while maintaining a biologically relevant ensemble.动态聚类阈值可在保持生物相关集合的同时减少构象集合的大小。
J Comput Aided Mol Des. 2010 Aug;24(8):675-86. doi: 10.1007/s10822-010-9365-1. Epub 2010 May 25.

引用本文的文献

1
PUG-View: programmatic access to chemical annotations integrated in PubChem.PUG-View:对整合于PubChem中的化学注释进行编程访问。
J Cheminform. 2019 Aug 9;11(1):56. doi: 10.1186/s13321-019-0375-2.
2
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets.相似化合物与相似构象异构体:PubChem二维和三维相邻集之间的互补性。
J Cheminform. 2016 Nov 4;8:62. doi: 10.1186/s13321-016-0163-1. eCollection 2016.
3
PubChem Substance and Compound databases.美国国立医学图书馆化学物质数据库和化合物数据库。

本文引用的文献

1
PubChem3D: Similar conformers.PubChem3D:相似构象。
J Cheminform. 2011 May 9;3:13. doi: 10.1186/1758-2946-3-13.
2
PubChem3D: Diversity of shape.PubChem3D:形状的多样性。
J Cheminform. 2011 Mar 21;3:9. doi: 10.1186/1758-2946-3-9.
3
PubChem3D: Conformer generation.PubChem3D:构象生成。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13. doi: 10.1093/nar/gkv951. Epub 2015 Sep 22.
4
PubChem structure-activity relationship (SAR) clusters.美国国立医学图书馆化学数据库结构-活性关系(SAR)聚类
J Cheminform. 2015 Jul 7;7:33. doi: 10.1186/s13321-015-0070-x. eCollection 2015.
5
PubChem3D: conformer ensemble accuracy.PubChem3D:构象系综精度。
J Cheminform. 2013 Jan 7;5(1):1. doi: 10.1186/1758-2946-5-1.
6
Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis.化合物中多个构象对 3-D 相似性搜索和生物测定数据分析的影响。
J Cheminform. 2012 Nov 7;4(1):28. doi: 10.1186/1758-2946-4-28.
7
PubChem3D: a new resource for scientists.PubChem3D:科学家的新资源。
J Cheminform. 2011 Sep 20;3(1):32. doi: 10.1186/1758-2946-3-32.
8
PubChem3D: Biologically relevant 3-D similarity.PubChem3D:具有生物学相关性的三维相似性。
J Cheminform. 2011 Jul 22;3(1):26. doi: 10.1186/1758-2946-3-26.
J Cheminform. 2011 Jan 27;3(1):4. doi: 10.1186/1758-2946-3-4.
4
An overview of the PubChem BioAssay resource.PubChem 生物测定资源概述。
Nucleic Acids Res. 2010 Jan;38(Database issue):D255-66. doi: 10.1093/nar/gkp965. Epub 2009 Nov 19.
5
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2010 Jan;38(Database issue):D5-16. doi: 10.1093/nar/gkp967. Epub 2009 Nov 12.
6
PubChem: a public information system for analyzing bioactivities of small molecules.PubChem:一个用于分析小分子生物活性的公共信息系统。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W623-33. doi: 10.1093/nar/gkp456. Epub 2009 Jun 4.
7
Small molecule shape-fingerprints.小分子形状指纹图谱。
J Chem Inf Model. 2005 May-Jun;45(3):673-84. doi: 10.1021/ci049651v.
8
A new class of molecular shape descriptors. 1. Theory and properties.
J Chem Inf Comput Sci. 2002 Mar-Apr;42(2):259-73. doi: 10.1021/ci000100o.