Uusitalo Pekka, Sorsa Aki, Russo Abegão Fernando, Ohenoja Markku, Ruusunen Mika
Environmental and Chemical Engineering Research Unit, Control Engineering Group, Faculty of Technology, P.O. Box 4300, University of Oulu, Oulu 90014, Finland.
School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom.
Ind Eng Chem Res. 2022 Apr 13;61(14):4752-4762. doi: 10.1021/acs.iecr.1c03995. Epub 2022 Mar 31.
Catalyst development for biorefining applications involves many challenges. Mathematical modeling can be seen as an essential tool in assisting to explain catalyst performance. This paper presents studies on several machine learning (ML) methods that can model the performance of heterogeneous catalysts with relevant descriptors. A systematic approach for selecting the most appropriate ML method is taken with focus on the variable selection. Regularization algorithms were applied to variable selection. Several different candidate model structures were compared in modeling with interpretation of results. The systematic modeling approach presented aims to highlight the necessary tools and aspects to unexperienced users of ML. Literature datasets for the hydrogenation of 5-ethoxymethylfurfural with simple bimetal catalysts, including main metals and promoters, were studied with the addition of catalyst descriptors found in the literature. Good results were obtained with the best models for estimating conversion, selectivity, and yield with correlations between 0.90 and 0.98. The best identified model structures were support vector regression, Gaussian process regression, and decision tree methods. In general, the use of variable selection procedures was found to improve the performance of models. The modeling methods applied thus seem to exhibit a strong potential in aiding catalyst development based mainly on the information content of descriptor datasets.
用于生物精炼应用的催化剂开发面临诸多挑战。数学建模可被视为辅助解释催化剂性能的重要工具。本文介绍了几种机器学习(ML)方法的研究,这些方法能够利用相关描述符对多相催化剂的性能进行建模。本文采用了一种系统的方法来选择最合适的ML方法,重点在于变量选择。正则化算法被应用于变量选择。在建模过程中比较了几种不同的候选模型结构,并对结果进行了解释。所提出的系统建模方法旨在向ML经验不足的用户突出必要的工具和方面。利用文献中发现的催化剂描述符,研究了包括主要金属和促进剂在内的简单双金属催化剂用于5-乙氧基甲基糠醛加氢的文献数据集。使用最佳模型来估计转化率、选择性和产率,得到了良好的结果,相关性在0.90至0.98之间。确定的最佳模型结构是支持向量回归、高斯过程回归和决策树方法。总体而言,发现使用变量选择程序可提高模型的性能。因此,基于描述符数据集的信息内容,所应用的建模方法在辅助催化剂开发方面似乎具有强大的潜力。