Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich, Switzerland.
Analyst. 2012 Apr 7;137(7):1604-10. doi: 10.1039/c2an15972d. Epub 2012 Feb 16.
Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); "quasi-nonlinear", such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol-gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.
现代工业产品分析化学需要快速、稳健和廉价的分析方法来连续监测产品质量参数。出于这个原因,光谱方法常用于在线/在线模式下控制工业产品的质量。振动光谱,包括中红外(MIR)、拉曼和近红外(NIR),是获取多组分混合物化学结构和质量系数信息的最佳方法之一。与专门为分析复杂、嘈杂和重叠信号而创建的化学计量学算法和多元数据分析(MDA)方法一起,NIR 光谱在准确性方面表现出色,包括经典预测误差、RMSEP。然而,尚不清楚组合的 NIR+MDA 方法是否能够处理在实际应用中不可避免的更复杂的内插或外推问题。在当前的研究中,我们试图对线性方法(如偏最小二乘法或投影到潜在结构(PLS))、“准非线性”方法(如 PLS 的多项式版本(Poly-PLS))以及本质上的非线性方法(如人工神经网络(ANNs)、支持向量回归(SVR)和最小二乘支持向量机(LS-SVM/LSSVM)进行相当一般的比较,方法是比较它们在稳健性方面的回归能力。作为稳健性的衡量标准,我们将尝试估计它们在解决内插和外推问题时的准确性。石油和生物燃料(生物柴油)系统被选为现实样本的代表性例子。研究了六个非常不同的化学系统,这些系统在复杂性、组成、结构和性质上有所不同;这些系统是汽油、乙醇-汽油生物燃料、柴油燃料、石油大分子的芳烃溶液、苯中的石油树脂和生物柴油。总共使用了 18 个不同的样本集。对基于 ANN 和 SVM 的回归工具在现代分析化学中的适用性得出了一般性结论。不同多元算法的有效性从经典准确性到稳健性有所不同。神经网络能够在经典 RMSEP 方面产生非常准确的结果,但无法解决内插问题,尤其是外推问题。基于支持向量机(SVM)思想的化学计量学方法能够解决经典回归和内插/外推任务。