• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GEOM,带能量注释的分子构象,用于性质预测和分子生成。

GEOM, energy-annotated molecular conformations for property prediction and molecular generation.

机构信息

Harvard University, Department of Chemistry and Chemical Biology, Cambridge, MA, 02138, USA.

Massachusetts Institute of Technology, Department of Materials Science and Engineering, Cambridge, MA, 02139, USA.

出版信息

Sci Data. 2022 Apr 21;9(1):185. doi: 10.1038/s41597-022-01288-4.

DOI:10.1038/s41597-022-01288-4
PMID:35449137
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9023519/
Abstract

Machine learning (ML) outperforms traditional approaches in many molecular design tasks. ML models usually predict molecular properties from a 2D chemical graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a molecule. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and experimental data. Here we use advanced sampling and semi-empirical density functional theory (DFT) to generate 37 million molecular conformations for over 450,000 molecules. The Geometric Ensemble Of Molecules (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with experimental data related to biophysics, physiology, and physical chemistry. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations.

摘要

机器学习(ML)在许多分子设计任务中表现优于传统方法。ML 模型通常根据 2D 化学图或单个 3D 结构预测分子性质,但这两种表示形式都无法解释分子可获得的整套 3D 构象。通过使用构象集合作为输入,可以提高性质预测的准确性,但目前还没有包含带有准确构象和实验数据注释的图形的大规模数据集。在这里,我们使用高级采样和半经验密度泛函理论(DFT)为超过 450,000 种分子生成了 3700 万个分子构象。几何分子集合(GEOM)数据集包含来自 QM9 的 133,000 种物质的构象,以及 317,000 种具有与生物物理学、生理学和物理化学相关的实验数据的物质的构象。还对具有 BACE-1 抑制数据的 1511 种物质的集合进行了带有高质量隐式水溶剂中 DFT 自由能的标记,并且对 534 种集合进行了 DFT 进一步优化。GEOM 将有助于开发从构象集合预测性质的模型,以及从 3D 构象进行生成的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/31e4a2e99f29/41597_2022_1288_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/3f83dcec4254/41597_2022_1288_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/d7bbf0f8fe58/41597_2022_1288_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/dc3914f618cc/41597_2022_1288_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/d6f9b2da3136/41597_2022_1288_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/31e4a2e99f29/41597_2022_1288_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/3f83dcec4254/41597_2022_1288_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/d7bbf0f8fe58/41597_2022_1288_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/dc3914f618cc/41597_2022_1288_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/d6f9b2da3136/41597_2022_1288_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffa6/9023519/31e4a2e99f29/41597_2022_1288_Fig5_HTML.jpg

相似文献

1
GEOM, energy-annotated molecular conformations for property prediction and molecular generation.GEOM,带能量注释的分子构象,用于性质预测和分子生成。
Sci Data. 2022 Apr 21;9(1):185. doi: 10.1038/s41597-022-01288-4.
2
CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning.CREMP:用于机器学习的大环肽构象-旋转异构体集合。
Sci Data. 2024 Aug 9;11(1):859. doi: 10.1038/s41597-024-03698-y.
3
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization.用于3D分子生成与优化的几何完全扩散
ArXiv. 2024 May 24:arXiv:2302.04313v6.
4
Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning.利用基于分子构象的等价图神经网络和深度多重实例学习进行配体虚拟筛选。
Molecules. 2023 Aug 9;28(16):5982. doi: 10.3390/molecules28165982.
5
Geometry-complete diffusion for 3D molecule generation and optimization.用于3D分子生成和优化的几何完全扩散
Commun Chem. 2024 Jul 3;7(1):150. doi: 10.1038/s42004-024-01233-z.
6
Reliable and Performant Identification of Low-Energy Conformers in the Gas Phase and Water.可靠且高效地识别气相和水中的低能量构象。
J Chem Inf Model. 2018 May 29;58(5):1005-1020. doi: 10.1021/acs.jcim.8b00151. Epub 2018 May 16.
7
PubChem3D: Conformer generation.PubChem3D:构象生成。
J Cheminform. 2011 Jan 27;3(1):4. doi: 10.1186/1758-2946-3-4.
8
Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations.应用原子神经网络使构象异构体集合偏向生物活性样构象。
J Cheminform. 2023 Dec 21;15(1):124. doi: 10.1186/s13321-023-00794-w.
9
Learning Joint 2-D and 3-D Graph Diffusion Models for Complete Molecule Generation.学习用于完整分子生成的联合二维和三维图扩散模型。
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):11857-11871. doi: 10.1109/TNNLS.2024.3416328. Epub 2024 Sep 3.
10
Ensemble completeness in conformer sampling: the case of small macrocycles.构象异构体采样中的集合完整性:小大环的情况
J Cheminform. 2021 Jul 29;13(1):55. doi: 10.1186/s13321-021-00524-0.

引用本文的文献

1
FlowMol3: Flow Matching for 3D De Novo Small-Molecule Generation.FlowMol3:用于三维从头小分子生成的流匹配
ArXiv. 2025 Aug 18:arXiv:2508.12629v1.
2
Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。
J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.
3
In-silico 3D molecular editing through physics-informed and preference-aligned generative foundation models.通过物理信息和偏好对齐生成基础模型进行的计算机模拟3D分子编辑。

本文引用的文献

1
Robust and Efficient Implicit Solvation Model for Fast Semiempirical Methods.稳健高效的隐式溶剂化模型用于快速半经验方法。
J Chem Theory Comput. 2021 Jul 13;17(7):4250-4261. doi: 10.1021/acs.jctc.1c00471. Epub 2021 Jun 29.
2
Efficient Quantum Chemical Calculation of Structure Ensembles and Free Energies for Nonrigid Molecules.高效量子化学计算非刚性分子的结构集合和自由能。
J Phys Chem A. 2021 May 20;125(19):4039-4054. doi: 10.1021/acs.jpca.1c00971. Epub 2021 Mar 10.
3
rSCAN-3c: A "Swiss army knife" composite electronic-structure method.
Nat Commun. 2025 Jul 1;16(1):6043. doi: 10.1038/s41467-025-61323-x.
4
Zero shot molecular generation via similarity kernels.通过相似性内核实现零样本分子生成。
Nat Commun. 2025 Jul 1;16(1):5991. doi: 10.1038/s41467-025-60963-3.
5
A 3D generation framework using diffusion model and reinforcement learning to generate multi-target compounds with desired properties.一种使用扩散模型和强化学习来生成具有所需特性的多靶点化合物的3D生成框架。
J Cheminform. 2025 Jun 4;17(1):93. doi: 10.1186/s13321-025-01035-y.
6
Token-Mol 1.0: tokenized drug design with large language models.Token-Mol 1.0:基于大语言模型的标记化药物设计
Nat Commun. 2025 May 13;16(1):4416. doi: 10.1038/s41467-025-59628-y.
7
Data-Driven Virtual Screening of Conformational Ensembles of Transition-Metal Complexes.基于数据驱动的过渡金属配合物构象集合虚拟筛选
J Chem Theory Comput. 2025 May 27;21(10):5334-5345. doi: 10.1021/acs.jctc.5c00303. Epub 2025 May 9.
8
A Perspective on Foundation Models in Chemistry.化学领域基础模型的视角
JACS Au. 2025 Mar 25;5(4):1499-1518. doi: 10.1021/jacsau.4c01160. eCollection 2025 Apr 28.
9
The QDπ dataset, training data for drug-like molecules and biopolymer fragments and their interactions.QDπ数据集,用于类药物分子、生物聚合物片段及其相互作用的训练数据。
Sci Data. 2025 Apr 25;12(1):693. doi: 10.1038/s41597-025-04972-3.
10
MolSnapper: Conditioning Diffusion for Structure-Based Drug Design.MolSnapper:基于结构的药物设计中的条件扩散
J Chem Inf Model. 2025 May 12;65(9):4263-4273. doi: 10.1021/acs.jcim.4c02008. Epub 2025 Apr 18.
rSCAN-3c:一种“瑞士军刀”式的复合电子结构方法。
J Chem Phys. 2021 Feb 14;154(6):064103. doi: 10.1063/5.0040021.
4
Single-Point Hessian Calculations for Improved Vibrational Frequencies and Rigid-Rotor-Harmonic-Oscillator Thermodynamics.单点 Hessian 计算提高振动频率和刚性转子谐振子热力学。
J Chem Theory Comput. 2021 Mar 9;17(3):1701-1714. doi: 10.1021/acs.jctc.0c01306. Epub 2021 Feb 8.
5
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集(MOSES):分子生成模型的基准测试平台。
Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.
6
The ORCA quantum chemistry program package.ORCA 量子化学程序包。
J Chem Phys. 2020 Jun 14;152(22):224108. doi: 10.1063/5.0004608.
7
Deep Generative Models for 3D Linker Design.用于 3D 接头设计的深度生成模型。
J Chem Inf Model. 2020 Apr 27;60(4):1983-1995. doi: 10.1021/acs.jcim.9b01120. Epub 2020 Apr 2.
8
A Deep Learning Approach to Antibiotic Discovery.深度学习在抗生素发现中的应用。
Cell. 2020 Feb 20;180(4):688-702.e13. doi: 10.1016/j.cell.2020.01.021.
9
Automated exploration of the low-energy chemical space with fast quantum chemical methods.运用快速量子化学方法探索低能量化学空间。
Phys Chem Chem Phys. 2020 Apr 14;22(14):7169-7192. doi: 10.1039/c9cp06869d. Epub 2020 Feb 19.
10
Molecular Geometry Prediction using a Deep Generative Graph Neural Network.基于深度生成图神经网络的分子几何结构预测。
Sci Rep. 2019 Dec 31;9(1):20381. doi: 10.1038/s41598-019-56773-5.