Liang Heng, Jiang Kun, Yan Tong-An, Chen Guang-Hui
Department of Chemistry, Key Laboratory for Preparation and Application of Ordered Structural Materials of Guangdong Province, Shantou University, Shantou 515063, Guangdong, China.
Department of Natural Science, Shantou Polytechnic, Shantou 515041, Guangdong, China.
ACS Omega. 2021 Mar 19;6(13):9066-9076. doi: 10.1021/acsomega.1c00100. eCollection 2021 Apr 6.
The inert gases Xe and Kr mainly exist in the used nuclear fuel (UNF) with the Xe/Kr ratio of 20:80, which it is difficult to separate. In this work, based on the G-MOFs database, high-throughput computational screening for metal-organic frameworks (MOFs) with high Xe/Kr adsorption selectivity was performed by combining grand canonical Monte Carlo (GCMC) simulations and machine learning (ML) technique for the first time. From the comparison of eight classical ML models, it is found that the XGBoost model with seven structural descriptors has superior accuracy in predicting the adsorption and separation performance of MOFs to Xe/Kr. Compared with energetic or electronic descriptors, structural descriptors are easier to obtain. Note that the determination coefficients of the generalized model for the Xe adsorption and Xe/Kr selectivity are very close to 1, at 0.951 and 0.973, respectively. In addition, 888 and 896 MOFs have been successfully predicted by the XGBoost model among the top 1000 MOFs in adsorption capacity and selectivity by GCMC simulation, respectively. According to the feature engineering of the XGBoost model, it is shown that the density (ρ), porosity (ϕ), pore volume (Vol), and pore limiting diameter (PLD) of MOFs are the key features that affect the Xe/Kr adsorption property. To test the generalization ability of the XGBoost model, we also tried to screen MOF adsorbents on the CO/CH mixture, it is found that the prediction performance of XGBoost is also much better than that of the traditional machine learning models although with the unbalanced data. Note that the dimension of features of MOFs is low while the quantity of MOF samples in database is very large, which is suitable for the prediction by model such as XGBoost to search the global minimum of cost function rather than the model involving feature creation. The present study represents the first report using the XGBoost algorithm to discover the MOF adsorbates.
惰性气体氙(Xe)和氪(Kr)主要存在于乏核燃料(UNF)中,其Xe/Kr比例为20:80,难以分离。在本工作中,基于G-MOFs数据库,首次结合巨正则蒙特卡罗(GCMC)模拟和机器学习(ML)技术,对具有高Xe/Kr吸附选择性的金属有机框架(MOF)进行了高通量计算筛选。通过对八个经典ML模型的比较发现,具有七个结构描述符的XGBoost模型在预测MOF对Xe/Kr的吸附和分离性能方面具有更高的准确性。与能量或电子描述符相比,结构描述符更容易获得。值得注意的是,Xe吸附和Xe/Kr选择性的广义模型的决定系数非常接近1,分别为0.951和0.973。此外,在通过GCMC模拟吸附容量和选择性排名前1000的MOF中,XGBoost模型分别成功预测了888个和896个MOF。根据XGBoost模型的特征工程,结果表明MOF的密度(ρ)、孔隙率(ϕ)、孔体积(Vol)和孔极限直径(PLD)是影响Xe/Kr吸附性能的关键特征。为了测试XGBoost模型的泛化能力,我们还尝试对CO/CH混合物的MOF吸附剂进行筛选,发现尽管数据不平衡,但XGBoost的预测性能也远优于传统机器学习模型。值得注意的是,MOF的特征维度较低,而数据库中的MOF样本数量非常大,这适合于通过XGBoost等模型进行预测,以搜索成本函数的全局最小值,而不是涉及特征创建的模型。本研究是首次使用XGBoost算法发现MOF吸附质的报告。