Tang Hao, Xiao Brian, He Wenhao, Subasic Pero, Harutyunyan Avetik R, Wang Yao, Liu Fang, Xu Haowei, Li Ju
Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA.
Nat Comput Sci. 2025 Feb;5(2):144-154. doi: 10.1038/s43588-024-00747-9. Epub 2024 Dec 27.
Machine learning plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules; however, most existing machine learning models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work we developed a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with several widely used hybrid and double-hybrid functionals in terms of both computational cost and prediction accuracy of various quantum chemical properties. We apply the model to aromatic compounds and semiconducting polymers, evaluating both ground- and excited-state properties. The results demonstrate the model's accuracy and generalization capability to complex systems that cannot be calculated using CCSD(T)-level methods due to scaling.
机器学习在量子化学中发挥着重要作用,为分子的各种性质提供快速评估的预测模型;然而,大多数现有的用于分子电子性质的机器学习模型在训练中使用密度泛函理论(DFT)数据库作为基准真值,其预测精度无法超越DFT。在这项工作中,我们开发了一种统一的机器学习方法,用于有机分子的电子结构,使用金标准的耦合簇单双激发(CCSD(T))计算作为训练数据。在烃类分子上进行测试时,我们的模型在计算成本和各种量子化学性质的预测精度方面均优于DFT以及几种广泛使用的杂化和双杂化泛函。我们将该模型应用于芳香族化合物和半导体聚合物,评估基态和激发态性质。结果证明了该模型对于因规模问题无法使用CCSD(T)级方法计算的复杂系统的准确性和泛化能力。