Ma Shuo, Ma Yingjin, Zhang Baohua, Tian Yingqi, Jin Zhong
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, China.
ACS Omega. 2021 Jan 14;6(3):2001-2024. doi: 10.1021/acsomega.0c04981. eCollection 2021 Jan 26.
With the view of achieving a better performance in task assignment and load-balancing, a top-level designed forecasting system for predicting computational times of density-functional theory (DFT)/time-dependent DFT (TDDFT) calculations is presented. The computational time is assumed as the intrinsic property for the molecule. Based on this assumption, the forecasting system is established using the "reinforced concrete", which combines the cheminformatics, several machine-learning (ML) models, and the framework of many-world interpretation (MWI) in multiverse ansatz. Herein, the cheminformatics is used to recognize the topological structure of molecules, the ML models are used to build the relationships between topology and computational cost, and the MWI framework is used to hold various combinations of DFT functionals and basis sets in DFT/TDDFT calculations. Calculated results of molecules from the DrugBank dataset show that (1) it can give quantitative predictions of computational costs, typical mean relative errors can be less than 0.2 for DFT/TDDFT calculations with derivations of ±25% using the exactly pretrained ML models and (2) it can also be employed to various combinations of DFT functional and basis set cases without exactly pretrained ML models, while only slightly enlarge predicting errors.
为了在任务分配和负载平衡方面实现更好的性能,本文提出了一种用于预测密度泛函理论(DFT)/含时密度泛函理论(TDDFT)计算时间的顶层设计预测系统。计算时间被视为分子的固有属性。基于这一假设,该预测系统采用“钢筋混凝土”构建,它结合了化学信息学、几种机器学习(ML)模型以及多宇宙假设中的多世界诠释(MWI)框架。在此,化学信息学用于识别分子的拓扑结构,ML模型用于建立拓扑结构与计算成本之间的关系,MWI框架用于容纳DFT/TDDFT计算中DFT泛函和基组的各种组合。来自DrugBank数据集的分子计算结果表明:(1)它可以对计算成本进行定量预测,对于使用经过精确预训练的ML模型且推导误差为±25%的DFT/TDDFT计算,典型平均相对误差可小于0.2;(2)在没有经过精确预训练的ML模型的情况下,它也可用于DFT泛函和基组情况的各种组合,同时只会略微增大预测误差。