Department of Chemical and Biomolecular Engineering, University of Connecticut , Storrs, Connecticut 06269, United States.
Institute of Materials Science, University of Connecticut , Storrs, Connecticut 06269, United States.
ACS Comb Sci. 2017 Oct 9;19(10):640-645. doi: 10.1021/acscombsci.7b00056. Epub 2017 Sep 5.
Using molecular simulation for adsorbent screening is computationally expensive and thus prohibitive to materials discovery. Machine learning (ML) algorithms trained on fundamental material properties can potentially provide quick and accurate methods for screening purposes. Prior efforts have focused on structural descriptors for use with ML. In this work, the use of chemical descriptors, in addition to structural descriptors, was introduced for adsorption analysis. Evaluation of structural and chemical descriptors coupled with various ML algorithms, including decision tree, Poisson regression, support vector machine and random forest, were carried out to predict methane uptake on hypothetical metal organic frameworks. To highlight their predictive capabilities, ML models were trained on 8% of a data set consisting of 130,398 MOFs and then tested on the remaining 92% to predict methane adsorption capacities. When structural and chemical descriptors were jointly used as ML input, the random forest model with 10-fold cross validation proved to be superior to the other ML approaches, with an R of 0.98 and a mean absolute percent error of about 7%. The training and prediction using the random forest algorithm for adsorption capacity estimation of all 130,398 MOFs took approximately 2 h on a single personal computer, several orders of magnitude faster than actual molecular simulations on high-performance computing clusters.
使用分子模拟进行吸附剂筛选在计算上非常昂贵,因此对于材料发现来说是不可行的。基于基本材料特性训练的机器学习 (ML) 算法可以为筛选目的提供快速而准确的方法。先前的工作主要集中在用于 ML 的结构描述符上。在这项工作中,除了结构描述符之外,还引入了化学描述符用于吸附分析。评估结构和化学描述符以及各种 ML 算法(包括决策树、泊松回归、支持向量机和随机森林),用于预测假想金属有机骨架上的甲烷吸收。为了突出它们的预测能力,ML 模型在包含 130398 个 MOF 的数据集的 8%上进行训练,然后在剩余的 92%上进行测试以预测甲烷吸附容量。当结构和化学描述符共同用作 ML 输入时,具有 10 倍交叉验证的随机森林模型被证明优于其他 ML 方法,R 值为 0.98,平均绝对百分比误差约为 7%。在单个个人计算机上,使用随机森林算法对所有 130398 个 MOF 的吸附容量估计进行训练和预测大约需要 2 小时,比在高性能计算集群上进行实际分子模拟快几个数量级。