Zhong Shifa, Chen Yushan, Li Jibai, Igou Thomas, Xiong Anyue, Guan Jian, Dai Zhenhua, Cai Xuanying, Qu Xintong, Chen Yongsheng
Department of Environmental Science, Institute of Eco-Chongming, School of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, P. R. China.
School of Civil & Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
Environ Sci Technol Lett. 2024 Sep 10;11(11):1193-1199. doi: 10.1021/acs.estlett.4c00524. eCollection 2024 Nov 12.
Screening ionic liquids (ILs) with low viscosity, low toxicity, and high CO absorption using machine learning (ML) models is crucial for mitigating global warming. However, when candidate ILs fall into the extrapolation zone of ML models, predictions may become unreliable, leading to poor decision-making. In this study, we introduce a "representation uncertainty" (RU) approach to quantify prediction uncertainty by employing four IL representations: molecular fingerprint, molecular descriptor, molecular image, and molecular graph. We develop four types of ML models based on these representations and calculate RU as the standard deviation of predictions across these models. Compared to traditional model uncertainty (MU), which is based on hyperparameter variations within a single representation, RU outperforms MU in identifying unreliable predictions across four IL property data sets: viscosity, toxicity, refractive index, and CO absorption capacity. Furthermore, we develop ensemble models from the four types of models, which show superior predictive performance compared with that of individual models. Using the RU approach, we screened 1420 ILs and identified 37 promising candidates with low viscosity, low toxicity, and high CO absorption capacity. The predictive performance of our ensemble model, along with the effectiveness of the RU-based approach, was experimentally validated by testing the CO absorption capacity of 14 ILs. This study not only offers a more reliable method for screening and designing ILs, accelerating the discovery process, but also introduces a new perspective on developing ensemble models with enhanced predictive performance.
使用机器学习(ML)模型筛选具有低粘度、低毒性和高CO吸收能力的离子液体(ILs)对于缓解全球变暖至关重要。然而,当候选离子液体落入ML模型的外推区域时,预测可能变得不可靠,从而导致决策不佳。在本研究中,我们引入了一种“表征不确定性”(RU)方法,通过采用四种离子液体表征来量化预测不确定性:分子指纹、分子描述符、分子图像和分子图。我们基于这些表征开发了四种类型的ML模型,并将RU计算为这些模型预测的标准差。与基于单一表征内超参数变化的传统模型不确定性(MU)相比,在识别四个离子液体性质数据集(粘度、毒性、折射率和CO吸收能力)的不可靠预测方面,RU优于MU。此外,我们从这四种类型的模型中开发了集成模型,其预测性能优于单个模型。使用RU方法,我们筛选了1420种离子液体,并确定了37种具有低粘度、低毒性和高CO吸收能力的有前景的候选物。通过测试14种离子液体的CO吸收能力,对我们集成模型的预测性能以及基于RU的方法的有效性进行了实验验证。本研究不仅提供了一种更可靠的离子液体筛选和设计方法,加速了发现过程,还为开发具有增强预测性能的集成模型引入了新的视角。