Department of Analytical Chemistry, University of Valencia, Edificio Jerónimo Muñoz, 50th Dr. Moliner, 46100 Burjassot, Spain.
Anal Bioanal Chem. 2011 Jan;399(3):1305-14. doi: 10.1007/s00216-010-4457-2. Epub 2010 Nov 30.
The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).
选择合适的校准集是多元方法开发的关键步骤。在这项工作中,讨论了基于先前未知样品分类使用不同校准集对偏最小二乘(PLS)回归模型性能的影响。例如,使用源于三种植物(橄榄油、葵花籽油和玉米油)的具有不同聚合三酰基甘油(PTG)含量的经煎炸过程的食用油的衰减全反射(ATR)中红外光谱。使用单类分类器偏最小二乘判别分析(PLS-DA)和有根二叉有向无环图树可以实现准确的油分类。无需食物煎炸的油样可以正确分类,而与它们的 PTG 含量无关。然而,食物煎炸的油样的分类则不太明显。使用双交叉模型验证和置换测试的组合来验证所获得的 PLS-DA 分类模型,以确认结果。为了讨论选择合适的 PLS 校准集的有用性,通过基于先前选择的类别计算 PLS 模型来确定 PTG 含量。与使用包含所有类别样品的汇总校准集计算的 PLS 模型相比,使用 PLS-DA 基于所选校准集计算的 PLS 模型可以显著提高预测均方根误差,范围在 1.06%至 2.91%(w/w)之间。