Pereira Florbela, Xiao Kaixia, Latino Diogo A R S, Wu Chengcheng, Zhang Qingyou, Aires-de-Sousa Joao
LAQV and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa , 2829-516 Caparica, Portugal.
Henan Engineering Research Center of Industrial Circulating Water Treatment, College of Chemistry and Chemical Engineering, Henan University , Kaifeng, 475004, PR China.
J Chem Inf Model. 2017 Jan 23;57(1):11-21. doi: 10.1021/acs.jcim.6b00340. Epub 2016 Dec 29.
Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set. Random forest models predicted an external test set of 9989 compounds achieving mean absolute error (MAE) up to 0.15 and 0.16 eV for the HOMO and LUMO orbitals, respectively. The impact of the quantum chemical calculation protocol was assessed with a subset of compounds. Inclusion of the orbital energy calculated by PM7 as an additional descriptor significantly improved the quality of estimations (reducing the MAE in >30%).
基于仅基于连接性的分子描述符,探索了机器学习算法以快速估计通过密度泛函理论(DFT)B3LYP计算的最高占据分子轨道(HOMO)和最低未占据分子轨道(LUMO)的轨道能量。整个项目包括分子结构的检索和生成、对超过111,000个结构的数据库进行量子化学计算、开发新的分子描述符以及机器学习模型的训练/验证。筛选了几种机器学习算法,并基于到训练集的欧几里得距离定义了适用域。随机森林模型预测了9989种化合物的外部测试集,对于HOMO和LUMO轨道,平均绝对误差(MAE)分别高达0.15和0.16 eV。用一部分化合物评估了量子化学计算协议的影响。将通过PM7计算的轨道能量作为额外描述符纳入显著提高了估计质量(将MAE降低了30%以上)。