学习进行化学预测：特征表示、数据与机器学习方法的相互作用

Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

作者信息

Haghighatlari Mojtaba, Li Jie, Heidar-Zadeh Farnaz, Liu Yuchen, Guan Xingyi, Head-Gordon Teresa

机构信息

Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA.

Center for Molecular Modeling (CMM), Ghent University, B-9052 Ghent, Belgium.

出版信息

Chem. 2020 Jul 9;6(7):1527-1542. doi: 10.1016/j.chempr.2020.05.014. Epub 2020 Jun 16.

DOI:10.1016/j.chempr.2020.05.014

PMID:32695924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7373218/

Abstract

Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning method with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning methods drawn from deep learning, random forests, or kernel methods.

摘要

最近，监督式机器学习在为化学、生物学和材料科学应用提供新的预测方法方面正日益兴起。在这篇观点文章中，我们重点关注机器学习方法与具有化学动机的描述符之间的相互作用，以及分子性质预测所需数据集的规模和类型。以核磁共振化学位移预测为例，我们证明，成功与否取决于化学结构特征提取或实空间表示的选择、分子性质数据是否丰富以及是通过实验还是计算得出的，以及这些因素如何共同影响从深度学习、随机森林或核方法中正确选择常用的机器学习方法。

相似文献

Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

Chem. 2020 Jul 9;6(7):1527-1542. doi: 10.1016/j.chempr.2020.05.014. Epub 2020 Jun 16.

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.

Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.

How to approach machine learning-based prediction of drug/compound-target interactions.

J Cheminform. 2023 Feb 6;15(1):16. doi: 10.1186/s13321-023-00689-w.

Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models.

J Chem Inf Model. 2023 Feb 27;63(4):1099-1113. doi: 10.1021/acs.jcim.2c01189. Epub 2023 Feb 9.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

An Ensemble Structure and Physicochemical (SPOC) Descriptor for Machine-Learning Prediction of Chemical Reaction and Molecular Properties.

Chemphyschem. 2022 Jul 19;23(14):e202200255. doi: 10.1002/cphc.202200255. Epub 2022 May 19.

Transfer learning for small molecule retention predictions.

J Chromatogr A. 2021 May 10;1644:462119. doi: 10.1016/j.chroma.2021.462119. Epub 2021 Mar 31.

Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset.

Data Brief. 2024 Feb 9;53:110178. doi: 10.1016/j.dib.2024.110178. eCollection 2024 Apr.

Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction.

Int J Mol Sci. 2019 Jul 10;20(14):3389. doi: 10.3390/ijms20143389.

Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design.

Acc Chem Res. 2021 Feb 2;54(3):532-545. doi: 10.1021/acs.accounts.0c00686. Epub 2021 Jan 22.

引用本文的文献

Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.

Chem Res Toxicol. 2025 May 19;38(5):759-807. doi: 10.1021/acs.chemrestox.5c00033. Epub 2025 May 2.

UCBShift 2.0: Bridging the Gap from Backbone to Side Chain Protein Chemical Shift Prediction for Protein Structures.

J Am Chem Soc. 2024 Nov 20;146(46):31733-31745. doi: 10.1021/jacs.4c10474. Epub 2024 Nov 12.

Catalysing (organo-)catalysis: Trends in the application of machine learning to enantioselective organocatalysis.

Beilstein J Org Chem. 2024 Sep 10;20:2280-2304. doi: 10.3762/bjoc.20.196. eCollection 2024.

Exploring an accurate machine learning model to quickly estimate stability of diverse energetic materials.

iScience. 2024 Mar 8;27(4):109452. doi: 10.1016/j.isci.2024.109452. eCollection 2024 Apr 19.

Highly Accurate Prediction of NMR Chemical Shifts from Low-Level Quantum Mechanics Calculations Using Machine Learning.

J Chem Theory Comput. 2024 Mar 12;20(5):2152-2166. doi: 10.1021/acs.jctc.3c01256. Epub 2024 Feb 8.

Integrated Molecular Modeling and Machine Learning for Drug Design.

J Chem Theory Comput. 2023 Nov 14;19(21):7478-7495. doi: 10.1021/acs.jctc.3c00814. Epub 2023 Oct 26.

Accurate, interpretable predictions of materials properties within transformer language models.

Patterns (N Y). 2023 Aug 2;4(10):100803. doi: 10.1016/j.patter.2023.100803. eCollection 2023 Oct 13.

Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review.

Int J Mol Sci. 2023 Jul 15;24(14):11488. doi: 10.3390/ijms241411488.

Force field-inspired transformer network assisted crystal density prediction for energetic materials.

J Cheminform. 2023 Jul 19;15(1):65. doi: 10.1186/s13321-023-00736-6.

Retention time prediction for chromatographic enantioseparation by quantile geometry-enhanced graph neural network.

Nat Commun. 2023 May 29;14(1):3095. doi: 10.1038/s41467-023-38853-3.

本文引用的文献

Accurate prediction of chemical shifts for aqueous protein structure on "Real World" data.

Chem Sci. 2020 Mar 3;11(12):3180-3191. doi: 10.1039/c9sc06561j.

IMPRESSION - prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy.

Chem Sci. 2019 Nov 20;11(2):508-515. doi: 10.1039/c9sc03854j. eCollection 2020 Jan 14.

FCHL revisited: Faster and more accurate quantum machine learning.

J Chem Phys. 2020 Jan 31;152(4):044107. doi: 10.1063/1.5126701.

Machine learning approaches for analyzing and enhancing molecular dynamics simulations.

Curr Opin Struct Biol. 2020 Apr;61:139-145. doi: 10.1016/j.sbi.2019.12.016. Epub 2020 Jan 20.

Improved protein structure prediction using potentials from deep learning.

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning.

Nat Commun. 2019 Nov 22;10(1):5316. doi: 10.1038/s41467-019-13297-w.

Predicting Materials Properties with Little Data Using Shotgun Transfer Learning.

ACS Cent Sci. 2019 Oct 23;5(10):1717-1730. doi: 10.1021/acscentsci.9b00804. Epub 2019 Sep 30.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.

ACS Cent Sci. 2019 Sep 25;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576. Epub 2019 Aug 30.

Analyzing Learned Molecular Representations for Property Prediction.

J Chem Inf Model. 2019 Aug 26;59(8):3370-3388. doi: 10.1021/acs.jcim.9b00237. Epub 2019 Aug 13.

Reconciling modern machine-learning practice and the classical bias-variance trade-off.

Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

学习进行化学预测：特征表示、数据与机器学习方法的相互作用

Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

作者信息

Haghighatlari Mojtaba, Li Jie, Heidar-Zadeh Farnaz, Liu Yuchen, Guan Xingyi, Head-Gordon Teresa

机构信息

Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA.

Center for Molecular Modeling (CMM), Ghent University, B-9052 Ghent, Belgium.

出版信息

Chem. 2020 Jul 9;6(7):1527-1542. doi: 10.1016/j.chempr.2020.05.014. Epub 2020 Jun 16.

DOI:10.1016/j.chempr.2020.05.014

PMID:32695924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7373218/

Abstract

摘要

学习进行化学预测：特征表示、数据与机器学习方法的相互作用

Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

学习进行化学预测：特征表示、数据与机器学习方法的相互作用

Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Methods.

作者信息

机构信息

出版信息