Pierce Karisa M, Hope Janiece L, Johnson Kevin J, Wright Bob W, Synovec Robert E
Department of Chemistry, Box 351700, University of Washington, Seattle, WA 98195, USA.
J Chromatogr A. 2005 Nov 25;1096(1-2):101-10. doi: 10.1016/j.chroma.2005.04.078.
A fast and objective chemometric classification method is developed and applied to the analysis of gas chromatography (GC) data from five commercial gasoline samples. The gasoline samples serve as model mixtures, whereas the focus is on the development and demonstration of the classification method. The method is based on objective retention time alignment (referred to as piecewise alignment) coupled with analysis of variance (ANOVA) feature selection prior to classification by principal component analysis (PCA) using optimal parameters. The degree-of-class-separation is used as a metric to objectively optimize the alignment and feature selection parameters using a suitable training set thereby reducing user subjectivity, as well as to indicate the success of the PCA clustering and classification. The degree-of-class-separation is calculated using Euclidean distances between the PCA scores of a subset of the replicate runs from two of the five fuel types, i.e., the training set. The unaligned training set that was directly submitted to PCA had a low degree-of-class-separation (0.4), and the PCA scores plot for the raw training set combined with the raw test set failed to correctly cluster the five sample types. After submitting the training set to piecewise alignment, the degree-of-class-separation increased (1.2), but when the same alignment parameters were applied to the training set combined with the test set, the scores plot clustering still did not yield five distinct groups. Applying feature selection to the unaligned training set increased the degree-of-class-separation (4.8), but chemical variations were still obscured by retention time variation and when the same feature selection conditions were used for the training set combined with the test set, only one of the five fuels was clustered correctly. However, piecewise alignment coupled with feature selection yielded a reasonably optimal degree-of-class-separation for the training set (9.2), and when the same alignment and ANOVA parameters were applied to the training set combined with the test set, the PCA scores plot correctly classified the gasoline fingerprints into five distinct clusters.
开发了一种快速且客观的化学计量学分类方法,并将其应用于对五个商业汽油样品的气相色谱(GC)数据的分析。汽油样品用作模型混合物,重点在于分类方法的开发和演示。该方法基于客观保留时间对齐(称为分段对齐),并在使用最佳参数通过主成分分析(PCA)进行分类之前结合方差分析(ANOVA)特征选择。类分离度用作一种度量标准,以使用合适的训练集客观地优化对齐和特征选择参数,从而减少用户主观性,并指示PCA聚类和分类的成功与否。类分离度使用来自五种燃料类型中的两种(即训练集)的重复运行子集的PCA得分之间的欧几里得距离来计算。直接提交给PCA的未对齐训练集的类分离度较低(0.4),并且原始训练集与原始测试集的PCA得分图未能正确地将五种样品类型聚类。将训练集进行分段对齐后,类分离度增加(1.2),但是当将相同的对齐参数应用于训练集与测试集的组合时,得分图聚类仍然没有产生五个不同的组。对未对齐的训练集应用特征选择增加了类分离度(4.8),但是化学变化仍然被保留时间变化所掩盖,并且当将相同的特征选择条件用于训练集与测试集的组合时,五种燃料中只有一种被正确聚类。然而,分段对齐与特征选择相结合为训练集产生了合理的最佳类分离度(9.2),并且当将相同的对齐和ANOVA参数应用于训练集与测试集的组合时,PCA得分图将汽油指纹正确地分类为五个不同的聚类。