Li Runzhao, Herreros Jose Martin, Tsolakis Athanasios, Yang Wenzhao
Department of Mechanical Engineering, School of Engineering, College of Engineering and Physical Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom.
Department of Mechanical Engineering, School of Engineering, College of Engineering and Physical Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom.
J Mol Graph Model. 2022 Mar;111:108083. doi: 10.1016/j.jmgm.2021.108083. Epub 2021 Nov 22.
Soot formation models become increasingly important in advanced renewable fuels formulation for soot reduction benefit. This work evaluates performance of machine learning (ML) and deep learning (DL) to predict yield sooting index (YSI) from chemical structure and proposes a tailor-made convolution neural network (CNN)-SDSeries38 for regression problem. In ML, a novel quantitative structure-property relationship (QSPR) is developed for feature extraction and the relationship between molecular structure and YSI is built by ML algorithm. In DL, SDSeries38 contains 9 feature learning modules, 1 regression module for automated feature learning and regression. It adopts standard series network architecture and modular structure, each feature learning module is a stack of convolution, batch normalization, activation, pooling layers. ML-QSPR model outperforms SDSeries38 in accuracy (RMSE = 7.563 vs 19.58), computational speed and the former applies to fuel mixtures. In DL, SDSeries38 network exceeds 10 classical CNN and provides a generic architecture enabling transfer application to other regression problem. DL application to regression is still in its infancy and there is no complete guide on how to develop specific CNN architectures for regression. Some gaps need to be filled: (1) Specially developed CNN architectures for regression are required; (2) The performances of direct transfer learning the classical CNN architectures from classification to regression are modest. A modular structure with typical function modules may provide an ideal solution; (3) Going deeper into the sequence of convolution layers improves predictive accuracy, but bears in mind to keep the number of layers below the threshold to avoid vanishing gradient.
在先进的可再生燃料配方中,为了实现减少碳烟的效益,碳烟形成模型变得越来越重要。这项工作评估了机器学习(ML)和深度学习(DL)从化学结构预测产率烟炱指数(YSI)的性能,并针对回归问题提出了一种定制的卷积神经网络(CNN)-SDSeries38。在机器学习中,开发了一种新颖的定量结构-性质关系(QSPR)用于特征提取,并通过机器学习算法建立了分子结构与YSI之间的关系。在深度学习中,SDSeries38包含9个特征学习模块、1个用于自动特征学习和回归的回归模块。它采用标准系列网络架构和模块化结构,每个特征学习模块都是由卷积、批量归一化、激活、池化层组成的堆栈。ML-QSPR模型在准确性(均方根误差RMSE = 7.563对19.58)、计算速度方面优于SDSeries38,并且前者适用于燃料混合物。在深度学习中,SDSeries38网络超过了10种经典的卷积神经网络,并提供了一种通用架构,能够将其转移应用于其他回归问题。深度学习在回归中的应用仍处于起步阶段,对于如何开发用于回归的特定卷积神经网络架构尚无完整指南。一些差距需要填补:(1)需要专门为回归开发的卷积神经网络架构;(2)将经典卷积神经网络架构从分类直接转移学习到回归的性能一般。具有典型功能模块的模块化结构可能提供理想的解决方案;(3)深入卷积层序列可提高预测准确性,但要记住将层数保持在阈值以下,以避免梯度消失。