Haghighatlari Mojtaba, Li Jie, Heidar-Zadeh Farnaz, Liu Yuchen, Guan Xingyi, Head-Gordon Teresa
Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA.
Center for Molecular Modeling (CMM), Ghent University, B-9052 Ghent, Belgium.
Chem. 2020 Jul 9;6(7):1527-1542. doi: 10.1016/j.chempr.2020.05.014. Epub 2020 Jun 16.
Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning method with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning methods drawn from deep learning, random forests, or kernel methods.
最近,监督式机器学习在为化学、生物学和材料科学应用提供新的预测方法方面正日益兴起。在这篇观点文章中,我们重点关注机器学习方法与具有化学动机的描述符之间的相互作用,以及分子性质预测所需数据集的规模和类型。以核磁共振化学位移预测为例,我们证明,成功与否取决于化学结构特征提取或实空间表示的选择、分子性质数据是否丰富以及是通过实验还是计算得出的,以及这些因素如何共同影响从深度学习、随机森林或核方法中正确选择常用的机器学习方法。