Mansoori Noora Al, Shaik Munawar Abdul, Sivaramakrishnan Kaushik
Department of Chemical and Petroleum Engineering, UAE University, Khalifa Street, Al Ain 15551, United Arab Emirates (UAE).
ACS Omega. 2025 Jul 2;10(27):29836-29855. doi: 10.1021/acsomega.5c04463. eCollection 2025 Jul 15.
This study explores the use of machine learning (ML) techniques to predict Fourier-transform infrared (FTIR) intensities of products from the thermal cracking of Athabasca bitumen, aiming to develop a reliable soft-sensor. The ultimate goal is to obtain the FTIR spectra of the thermally cracked products online to reduce process time from slow physical measurements. Various ML models, including Linear Regression (LinR), partial least squares regression (PLSR), support vector regression (SVR), K-nearest neighbors (-NN), random forest (RF), and gradient boosting regression (GBR), were implemented to enhance the predictive accuracy and efficiency of FTIR spectroscopy, aiming to reduce the need for traditional physical measurements which are often slow compared to the rapid predictions offered by ML techniques. To assess the model's generalization capabilities, with respect to model predictions, the models were trained and tested across four different scenarios with varying temperature data obtained from visbreaking experiments performed on Athabasca Bitumen at temperatures ranging from 25 to 420 °C with reaction times ranging from 15 min to 27 h. Scenario 1 included all 61,740 data points utilizing an 80/20 train-test split with 10-fold cross-validation (CV). Scenario 2 involved training on temperatures of 25, 350, and 400 °C and testing on 300, 380, and 420 °C. Scenario 3 involved training on temperatures of 350, 380, and 400 °C and testing on 25, 300, and 420 °C. Finally, Scenario 4 involved training on temperatures of 25, 300, 350, and 380 °C and testing on 400 and 420 °C. Bayesian optimization was employed for hyperparameter tuning to identify the optimal configurations for each model. The results indicate that ensemble methods, particularly GBR, consistently achieved the highest predictive accuracy ( ) and lowest root mean squared error (RMSE) across all scenarios. In Scenario 1, GBR achieved a prediction accuracy of 99.66%. Scenario 2 highlighted the models' ability to generalize across varying temperatures, with both RF and GBR achieving similar performance with high prediction accuracies of around 94%. Scenario 3, characterized by significant temperature variability, demonstrated the robustness of GBR, which outperformed RF and -NN with a predictive accuracy of 92.15%. Scenario 4, focusing on high-temperature predictions from low-temperature training data, showed that GBR still performed robustly with a predictive accuracy of 80.40%. The study concludes that GBR models, particularly those with well-tuned hyperparameters, are highly effective in predicting FTIR intensities, outperforming other techniques like RF, -NN, LinR, and PLSR. The integration of advanced ML techniques and Bayesian optimization significantly enhances the capability to predict FTIR spectra, providing a reliable soft-sensor as an alternative to traditional physical experimentation methods. This approach not only saves time and resources but also ensures consistent and high-quality predictive performance in chemical analysis and monitoring.
本研究探索使用机器学习(ML)技术来预测阿萨巴斯卡沥青热裂解产物的傅里叶变换红外(FTIR)强度,旨在开发一种可靠的软传感器。最终目标是在线获取热裂解产物的FTIR光谱,以减少因缓慢的物理测量而导致的处理时间。实施了各种ML模型,包括线性回归(LinR)、偏最小二乘回归(PLSR)、支持向量回归(SVR)、K近邻(-NN)、随机森林(RF)和梯度提升回归(GBR),以提高FTIR光谱学的预测准确性和效率,旨在减少对传统物理测量的需求,与ML技术提供的快速预测相比,传统物理测量通常较慢。为了评估模型的泛化能力,针对模型预测,在四个不同场景下对模型进行训练和测试,这些场景使用了从对阿萨巴斯卡沥青进行的减黏裂化实验中获得的不同温度数据,温度范围为25至420°C,反应时间为15分钟至27小时。场景1包括所有61740个数据点,采用80/20的训练-测试分割,并进行10折交叉验证(CV)。场景2涉及在25、350和400°C的温度下进行训练,并在300、380和420°C的温度下进行测试。场景3涉及在350、380和400°C的温度下进行训练,并在25、300和420°C的温度下进行测试。最后,场景4涉及在25、300、350和380°C的温度下进行训练,并在400和420°C的温度下进行测试。采用贝叶斯优化进行超参数调整,以确定每个模型的最佳配置。结果表明,集成方法,特别是GBR,在所有场景中始终实现了最高的预测准确性( )和最低的均方根误差(RMSE)。在场景1中,GBR实现了99.66%的预测准确率。场景2突出了模型在不同温度下的泛化能力,RF和GBR都表现出相似的性能,预测准确率约为94%,较高。场景3的特点是温度变化显著,证明了GBR的稳健性,其预测准确率为92.15%,优于RF和-NN。场景4专注于从低温训练数据进行高温预测,结果表明GBR仍然表现稳健,预测准确率为80.40%。该研究得出结论,GBR模型,特别是那些超参数调整良好的模型,在预测FTIR强度方面非常有效,优于RF、-NN、LinR和PLSR等其他技术。先进的ML技术与贝叶斯优化的集成显著增强了预测FTIR光谱的能力,提供了一种可靠的软传感器,可替代传统的物理实验方法。这种方法不仅节省了时间和资源,还确保了化学分析和监测中一致且高质量的预测性能。