Song Weiran, Wang Hui, Maguire Paul, Nibouche Omar
School of Computing and Mathematics, Ulster University, BT37 0QB, Newtownabbey, Co. Antrim, UK.
School of Computing and Mathematics, Ulster University, BT37 0QB, Newtownabbey, Co. Antrim, UK.
Anal Chim Acta. 2018 Jun 7;1009:27-38. doi: 10.1016/j.aca.2018.01.023.
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time.
偏最小二乘判别分析(PLS - DA)是光谱数据分析中最有效的多元分析方法之一,它提取潜在变量并用于预测响应。特别是,它是处理高维和共线光谱数据的有效方法。然而,PLS - DA没有明确解决数据多模态问题,即数据的类内多模态分布。在本文中,我们提出了一种名为基于最近聚类的PLS - DA(NCPLS - DA)的新方法,用于明确解决多模态和非线性问题,并提高PLS - DA在光谱数据分类上的性能。该新方法应用层次聚类将样本划分为簇,并计算每个簇的相应中心。对于给定的查询点,仅将中心最接近该查询点的簇用于PLS - DA。这种方法可以提供一个简单有效的工具,将多模态和非线性类分离为局部线性和单模态的簇。在17个数据集上的实验结果,包括12个UCI数据集和5个光谱数据集,表明NCPLS - DA可以优于4种基线方法,即PLS - DA、核PLS - DA、局部PLS - DA和k - NN,在大多数情况下实现最高的分类准确率。