Yang Peisong, Zhang Huan, Lai Xin, Wang Kunfeng, Yang Qingyuan, Yu Duli
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China.
State Key Laboratory of Organic-Inorganic Composites, Beijing University of Chemical Technology, Beijing 100029, China.
ACS Omega. 2021 Jun 25;6(27):17149-17161. doi: 10.1021/acsomega.0c05990. eCollection 2021 Jul 13.
Covalent organic frameworks (COFs) have the advantages of high thermal stability and large specific surface and have great application prospects in the fields of gas storage and catalysis. This article mainly focuses on COFs' working capacity of methane (CH). Due to the vast number of possible COF structures, it is time-consuming to use traditional calculation methods to find suitable materials, so it is important to apply appropriate machine learning (ML) algorithms to build accurate prediction models. A major obstacle for the use of ML algorithms is that the performance of an algorithm may be affected by many design decisions. Finding appropriate algorithm and model parameters is quite a challenge for nonprofessionals. In this work, we use automated machine learning (AutoML) to analyze the working capacity of CH based on 403,959 COFs. We explore the relationship between 23 features such as the structure, chemical characteristics, atom types of COFs, and the working capacity. Then, the tree-based pipeline optimization tool (TPOT) in AutoML and the traditional ML methods including multiple linear regression, support vector machine, decision tree, and random forest that manually set model parameters are compared. It is found that the TPOT can not only save complex data preprocessing and model parameter tuning but also show higher performance than traditional ML models. Compared with traditional grand canonical Monte Carlo simulations, it can save a lot of time. AutoML has broken through the limitations of professionals so that researchers in nonprofessional fields can realize automatic parameter configuration for experiments to obtain highly accurate and easy-to-understand results, which is of great significance for material screening.
共价有机框架(COFs)具有高热稳定性和大比表面积的优点,在气体存储和催化领域具有广阔的应用前景。本文主要关注COFs对甲烷(CH₄)的吸附量。由于可能的COF结构数量众多,使用传统计算方法寻找合适的材料很耗时,因此应用适当的机器学习(ML)算法来建立准确的预测模型很重要。使用ML算法的一个主要障碍是算法的性能可能受到许多设计决策的影响。对于非专业人员来说,找到合适的算法和模型参数是一项相当大的挑战。在这项工作中,我们使用自动机器学习(AutoML)基于403,959种COFs分析CH₄的吸附量。我们探索了COFs的结构、化学特性、原子类型等23个特征与吸附量之间的关系。然后,比较了AutoML中的基于树的管道优化工具(TPOT)和包括手动设置模型参数的多元线性回归、支持向量机、决策树和随机森林等传统ML方法。结果发现,TPOT不仅可以节省复杂的数据预处理和模型参数调整,而且表现出比传统ML模型更高的性能。与传统的巨正则蒙特卡罗模拟相比,它可以节省大量时间。AutoML突破了专业人员的限制,使非专业领域的研究人员能够实现实验的自动参数配置,以获得高度准确且易于理解的结果,这对材料筛选具有重要意义。