Olayiwola Teslim, Kumar Revati, Romagnoli Jose A
Cain Department of Chemical Engineering, Louisiana State University, Baton Rouge, Louisiana 70803, United States.
Department of Chemistry, Louisiana State University, Baton Rouge, Louisiana 70803, United States.
Ind Eng Chem Res. 2024 Jun 29;63(27):11971-11981. doi: 10.1021/acs.iecr.4c01171. eCollection 2024 Jul 10.
Developing data-driven models has found successful applications in engineering tasks, such as material design, process modeling, and process monitoring. In capacitive devices like deionization and supercapacitors, there exists potential for applying this data-driven machine learning (ML) model in optimizing its potential use in energy-efficient separations or energy generation. However, these models are faced with limited datasets, and even in large quantities, the datasets are incomplete, limiting their potential use for successful data-driven modeling. Here, the success of transfer learning in resolving the challenges with limited datasets was exploited. A two-step data-driven ML modeling framework named involving training with ML-imputed datasets and then with clean datasets was explored. Through data imputation and transfer learning, it is possible to develop a data-driven model with acceptable metrics mirroring experimental measurements. By using the model, optimization studies using the genetic algorithm were implemented to analyze the solution under the Pareto optimality. This early insight can be used in the initial stage of experimental measurements to rapidly identify experimental conditions worthy of further investigation. Moreover, we expect that the insights from these results will drive accurate predictive modeling in other fields including healthcare, genomic data analysis, and environmental monitoring with incomplete datasets.
开发数据驱动模型已在工程任务中获得成功应用,如材料设计、过程建模和过程监测。在诸如去离子化和超级电容器等电容式设备中,应用这种数据驱动的机器学习(ML)模型来优化其在节能分离或能量产生方面的潜在用途具有可能性。然而,这些模型面临数据集有限的问题,而且即便数据集数量众多,它们也是不完整的,这限制了其在成功的数据驱动建模中的潜在用途。在此,利用了迁移学习在解决有限数据集挑战方面的成功经验。探索了一个名为的两步数据驱动ML建模框架,该框架包括先用ML插补数据集进行训练,然后再用干净数据集进行训练。通过数据插补和迁移学习,有可能开发出一个具有与实验测量结果相符的可接受指标的数据驱动模型。通过使用该模型,实施了利用遗传算法的优化研究,以分析帕累托最优下的解决方案。这种早期见解可在实验测量的初始阶段用于快速识别值得进一步研究的实验条件。此外,我们期望这些结果所带来的见解将推动在包括医疗保健、基因组数据分析和不完整数据集环境监测等其他领域进行准确的预测建模。