LAQV/REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, 2829-516 Caparica, Portugal.
Mol Inform. 2020 Sep;39(9):e2000001. doi: 10.1002/minf.202000001. Epub 2020 May 29.
The increasing application of new ionic liquids (IL) creates the need of liquid-liquid equilibria data for both miscible and quasi-immiscible systems. In this study, equilibrium concentrations at different temperatures for ionic liquid+water two-phase systems were modeled using a Quantitative-Structure-Property Relationship (QSPR) method. Data on equilibrium concentrations were taken from the ILThermo Ionic Liquids database, curated and used to make models that predict the weight fraction of water in ionic liquid rich phase and ionic liquid in the aqueous phase as two separate properties. The major modeling challenge stems from the fact that each single IL is characterized by several data points, since equilibrium concentrations are temperature dependent. Thus, new approaches for the detection of potential data point outliers, testing set selection, and quality prediction have been developed. Training set comprised equilibrium concentration data for 67 and 68 ILs in case of water in IL and IL in water modeling, respectively. SiRMS, MOLMAPS, Rcdk and Chemaxon descriptors were used to build Random Forest models for both properties. Models were subjected to the Y-scrambling test for robustness assessment. The best models have also been validated using an external test set that is not part of the ILThermo database. A two-phase equilibrium diagram for one of the external test set IL is presented for better visualization of the results and potential derivation of tie lines.
越来越多的新型离子液体(IL)的应用,需要离子液体+水混合和准不混相体系的液液平衡数据。本研究采用定量结构-性质关系(QSPR)方法,对不同温度下离子液体+水两相体系的平衡浓度进行了建模。平衡浓度数据取自 ILThermo 离子液体数据库,经过整理后用于建立模型,分别预测富离子液体相和水相中离子液体的质量分数这两个独立的性质。主要的建模挑战源于这样一个事实,即每个单一的离子液体都有几个数据点,因为平衡浓度是温度依赖的。因此,已经开发了新的方法来检测潜在的数据点异常值、测试集选择和质量预测。对于水在离子液体中和离子液体在水中的建模,训练集分别由 67 和 68 个离子液体的平衡浓度数据组成。SiRMS、MOLMAPS、Rcdk 和 Chemaxon 描述符用于建立这两种性质的随机森林模型。模型经过 Y 打乱测试以评估稳健性。使用不属于 ILThermo 数据库的外部测试集对最佳模型进行了验证。为了更好地可视化结果并潜在地推导出连接线,还为外部测试集中的一个 IL 呈现了两相平衡图。