Department of Analytical Chemistry and Organic Chemistry, Rovira and Virgili University, C/Marcel.lí Domingo, s/n. 43007, Tarragona, Spain.
Anal Chim Acta. 2010 Apr 1;664(1):27-33. doi: 10.1016/j.aca.2010.01.059. Epub 2010 Feb 6.
This work describes multi-classification based on binary probabilistic discriminant partial least squares (p-DPLS) models, developed with the strategy one-against-one and the principle of winner-takes-all. The multi-classification problem is split into binary classification problems with p-DPLS models. The results of these models are combined to obtain the final classification result. The classification criterion uses the specific characteristics of an object (position in the multivariate space and prediction uncertainty) to estimate the reliability of the classification, so that the object is assigned to the class with the highest reliability. This new methodology is tested with the well-known Iris data set and a data set of Italian olive oils. When compared with CART and SIMCA, the proposed method has better average performance of classification, besides giving a statistic that evaluates the reliability of classification. For the olive oil set the average percentage of correct classification for the training set was close to 84% with p-DPLS against 75% with CART and 100% with SIMCA, while for the test set the average was close to 94% with p-DPLS as against 50% with CART and 62% with SIMCA.
这项工作描述了基于二进制概率判别偏最小二乘法(p-DPLS)模型的多分类,该模型采用一对一策略和胜者全拿原则开发。多分类问题被分解为具有 p-DPLS 模型的二进制分类问题。这些模型的结果被组合起来以获得最终的分类结果。分类标准利用对象的特定特征(多元空间中的位置和预测不确定性)来估计分类的可靠性,以便将对象分配给可靠性最高的类别。该新方法使用著名的鸢尾花数据集和意大利橄榄油数据集进行了测试。与 CART 和 SIMCA 相比,所提出的方法具有更好的分类平均性能,并且给出了评估分类可靠性的统计数据。对于橄榄油数据集,p-DPLS 对训练集的正确分类的平均百分比接近 84%,而 CART 为 75%,SIMCA 为 100%,而对于测试集,p-DPLS 的平均百分比接近 94%,而 CART 为 50%,SIMCA 为 62%。