• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据集的化学多样性限制了机器学习预测的通用性。

Dataset's chemical diversity limits the generalizability of machine learning predictions.

作者信息

Glavatskikh Marta, Leguy Jules, Hunault Gilles, Cauchy Thomas, Da Mota Benoit

机构信息

LERIA, University of Angers, 2 Bd Lavoisier, 49045, Angers, France.

Laboratoire MOLTECH-Anjou, UMR CNRS 6200, SFR MATRIX, UNIV Angers, 2 Bd Lavoisier, 49045, Angers, France.

出版信息

J Cheminform. 2019 Nov 12;11(1):69. doi: 10.1186/s13321-019-0391-2.

DOI:10.1186/s13321-019-0391-2
PMID:33430991
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6852905/
Abstract

The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 "heavy" atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset.

摘要

QM9数据集已成为机器学习(ML)预测各种化学性质的黄金标准。QM9基于GDB,而GDB是对化学空间的组合探索。最近发表的ML分子预测结果在准确性上与密度泛函理论计算相当。此类ML模型需要在真实数据上进行测试和泛化。本文介绍了PubChemQC项目的一个新的与QM9等效的数据集PC9(仅包含H、C、N、O和F以及最多9个“重”原子)。对键距和化学官能团的统计研究表明,这个新数据集涵盖了更多的化学多样性。在这两个数据集上都使用了核岭回归、弹性网络和SchNet提供的神经网络模型。QM9子集在能量预测方面的总体准确性更高。然而,在PC9上训练的模型在预测另一个数据集的能量方面表现出更强的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/181929c4cafd/13321_2019_391_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/cbea6df44434/13321_2019_391_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/82448ef5f15a/13321_2019_391_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/ec0378b15ad6/13321_2019_391_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/b0033f10fa5b/13321_2019_391_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/0ca6c53454e4/13321_2019_391_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/221df2687ec4/13321_2019_391_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/250976ec3bb0/13321_2019_391_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/15f2b81f6b8c/13321_2019_391_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/3931ab6b69f7/13321_2019_391_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/181929c4cafd/13321_2019_391_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/cbea6df44434/13321_2019_391_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/82448ef5f15a/13321_2019_391_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/ec0378b15ad6/13321_2019_391_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/b0033f10fa5b/13321_2019_391_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/0ca6c53454e4/13321_2019_391_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/221df2687ec4/13321_2019_391_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/250976ec3bb0/13321_2019_391_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/15f2b81f6b8c/13321_2019_391_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/3931ab6b69f7/13321_2019_391_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c16d/6852905/181929c4cafd/13321_2019_391_Fig10_HTML.jpg

相似文献

1
Dataset's chemical diversity limits the generalizability of machine learning predictions.数据集的化学多样性限制了机器学习预测的通用性。
J Cheminform. 2019 Nov 12;11(1):69. doi: 10.1186/s13321-019-0391-2.
2
Chemical diversity in molecular orbital energy predictions with kernel ridge regression.基于核岭回归的分子轨道能量预测中的化学多样性
J Chem Phys. 2019 May 28;150(20):204121. doi: 10.1063/1.5086105.
3
Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies.量子化学数据库特性对互变异构能机器学习预测的影响。
J Chem Theory Comput. 2021 Aug 10;17(8):4769-4785. doi: 10.1021/acs.jctc.1c00363. Epub 2021 Jul 21.
4
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error.分子机器学习模型的预测误差低于混合密度泛函理论误差。
J Chem Theory Comput. 2017 Nov 14;13(11):5255-5264. doi: 10.1021/acs.jctc.7b00577. Epub 2017 Oct 10.
5
MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods.MultiXC-QM9:多水平量子化学方法的分子和反应能大数据集。
Sci Data. 2023 Nov 8;10(1):783. doi: 10.1038/s41597-023-02690-2.
6
Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization.用于从头分子生成的多样性可扩展估计器,可生成更强大的量子力学数据集(OD9)并实现更高效的分子优化。
J Cheminform. 2021 Oct 2;13(1):76. doi: 10.1186/s13321-021-00554-8.
7
OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features.OrbNet:利用对称适配原子轨道特征进行量子化学的深度学习
J Chem Phys. 2020 Sep 28;153(12):124111. doi: 10.1063/5.0021955.
8
Novel machine learning insights into the QM7b and QM9 quantum mechanics datasets.关于QM7b和QM9量子力学数据集的新型机器学习见解。
J Comput Chem. 2024 Jun 5;45(15):1193-1214. doi: 10.1002/jcc.27295. Epub 2024 Feb 8.
9
Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions.探索量子化学性质的深度学习在吸收、分布、代谢和排泄预测中的应用。
J Chem Inf Model. 2022 Dec 26;62(24):6336-6341. doi: 10.1021/acs.jcim.2c00245. Epub 2022 Jun 27.
10
Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset.基于 QM9 量子化学数据集的 SMILES 表示的机器学习对 9 种分子性质的预测。
J Phys Chem A. 2020 Nov 25;124(47):9854-9866. doi: 10.1021/acs.jpca.0c05969. Epub 2020 Nov 11.

引用本文的文献

1
Active Learning Improves Ionization Efficiency Predictions and Quantification in Nontargeted LC/HRMS.主动学习提高了非靶向液相色谱/高分辨质谱中的电离效率预测和定量分析。
Anal Chem. 2025 Jul 1;97(25):13131-13139. doi: 10.1021/acs.analchem.5c00816. Epub 2025 Jun 13.
2
RGBChem: Image-Like Representation of Chemical Compounds for Property Prediction.RGBChem:用于性质预测的化合物图像式表示法。
J Chem Theory Comput. 2025 May 27;21(10):5322-5333. doi: 10.1021/acs.jctc.5c00291. Epub 2025 May 12.
3
Molecular analysis and design using generative artificial intelligence multi-agent modeling.

本文引用的文献

1
Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning.通过迁移学习,用通用神经网络势逼近耦合簇精度。
Nat Commun. 2019 Jul 1;10(1):2903. doi: 10.1038/s41467-019-10827-4.
2
Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra.深度学习光谱学:用于分子激发光谱的神经网络
Adv Sci (Weinh). 2019 Jan 29;6(9):1801367. doi: 10.1002/advs.201801367. eCollection 2019 May 3.
3
PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges.
使用生成式人工智能多智能体建模的分子分析与设计
Mol Syst Des Eng. 2025 Jan 24;10(4):314-337. doi: 10.1039/d4me00174e. eCollection 2025 Mar 31.
4
Pretraining graph transformers with atom-in-a-molecule quantum properties for improved ADMET modeling.利用分子中原子量子性质预训练图变换器以改进ADMET建模。
J Cheminform. 2025 Feb 27;17(1):25. doi: 10.1186/s13321-025-00970-0.
5
QM9star, two Million DFT-computed Equilibrium Structures for Ions and Radicals with Atomic Information.QM9星,两百万个通过密度泛函理论计算得到的带有原子信息的离子和自由基平衡结构。
Sci Data. 2024 Oct 21;11(1):1158. doi: 10.1038/s41597-024-03933-6.
6
Quantum Chemistry Dataset with Ground- and Excited-state Properties of 450 Kilo Molecules.包含45万个分子基态和激发态性质的量子化学数据集。
Sci Data. 2024 Aug 29;11(1):948. doi: 10.1038/s41597-024-03788-x.
7
Machine Learning Approach to Vertical Energy Gap in Redox Processes.氧化还原过程中垂直能隙的机器学习方法。
J Chem Theory Comput. 2024 Aug 13;20(15):6747-6755. doi: 10.1021/acs.jctc.4c00715. Epub 2024 Jul 23.
8
Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines.简化流水线效率:一种新颖的模型不可知技术,用于加速条件生成和虚拟筛选流水线。
Sci Rep. 2023 Nov 29;13(1):21069. doi: 10.1038/s41598-023-42952-y.
9
Integrated Molecular Modeling and Machine Learning for Drug Design.基于分子模拟的药物设计与机器学习的整合。
J Chem Theory Comput. 2023 Nov 14;19(21):7478-7495. doi: 10.1021/acs.jctc.3c00814. Epub 2023 Oct 26.
10
Generative organic electronic molecular design informed by quantum chemistry.由量子化学推动的有机电子分子生成设计。
Chem Sci. 2023 Sep 13;14(40):11045-11055. doi: 10.1039/d3sc03781a. eCollection 2023 Oct 18.
PhysNet:用于预测能量、力、偶极矩和部分电荷的神经网络。
J Chem Theory Comput. 2019 Jun 11;15(6):3678-3693. doi: 10.1021/acs.jctc.9b00181. Epub 2019 May 14.
4
A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules.基于分子轨道的机器学习的通用密度矩阵泛函:在有机分子中的可转移性。
J Chem Phys. 2019 Apr 7;150(13):131103. doi: 10.1063/1.5088393.
5
Data sampling scheme for reproducing energies along reaction coordinates in high-dimensional neural network potentials.在高维神经网络势能中沿反应坐标重现能量的数据采样方案。
J Chem Phys. 2019 Apr 7;150(13):134103. doi: 10.1063/1.5078394.
6
Training Neural Nets To Learn Reactive Potential Energy Surfaces Using Interactive Quantum Chemistry in Virtual Reality.利用虚拟现实中的交互式量子化学训练神经网络以学习反应势能面
J Phys Chem A. 2019 May 23;123(20):4486-4499. doi: 10.1021/acs.jpca.9b01006. Epub 2019 Apr 18.
7
Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models.从失败中学习:用机器学习模型预测电子结构计算结果。
J Chem Theory Comput. 2019 Apr 9;15(4):2331-2345. doi: 10.1021/acs.jctc.9b00057. Epub 2019 Mar 22.
8
Accurate molecular polarizabilities with coupled cluster theory and machine learning.基于耦合簇理论和机器学习的精确分子极化率
Proc Natl Acad Sci U S A. 2019 Feb 26;116(9):3401-3406. doi: 10.1073/pnas.1816132116. Epub 2019 Feb 7.
9
Transferable Machine-Learning Model of the Electron Density.电子密度的可转移机器学习模型
ACS Cent Sci. 2019 Jan 23;5(1):57-64. doi: 10.1021/acscentsci.8b00551. Epub 2018 Dec 26.
10
Machine learning model for non-equilibrium structures and energies of simple molecules.机器学习模型用于简单分子的非平衡结构和能量。
J Chem Phys. 2019 Jan 14;150(2):024307. doi: 10.1063/1.5054968.