IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.
Principal component analysis (PCA) has been used to study the pathogenesis of diseases. To enhance the interpretability of classical PCA, various improved PCA methods have been proposed to date. Among these, a typical method is the so-called sparse PCA, which focuses on seeking sparse loadings. However, the performance of these methods is still far from satisfactory due to their limitation of using unsupervised learning methods; moreover, the class ambiguity within the sample is high. To overcome this problem, this paper developed a new PCA method, which is named the supervised discriminative sparse PCA (SDSPCA). The main innovation of this method is the incorporation of discriminative information and sparsity into the PCA model. Specifically, in contrast to the traditional sparse PCA, which imposes sparsity on the loadings, here, sparse components are obtained to represent the data. Furthermore, via the linear transformation, the sparse components approximate the given label information. On the one hand, sparse components improve interpretability over the traditional PCA, while on the other hand, they are have discriminative abilities suitable for classification purposes. A simple algorithm is developed, and its convergence proof is provided. SDSPCA has been applied to the common-characteristic gene selection and tumor classification on multiview biological data. The sparsity and classification performance of SDSPCA are empirically verified via abundant, reasonable, and effective experiments, and the obtained results demonstrate that SDSPCA outperforms other state-of-the-art methods.
主成分分析(PCA)已被用于研究疾病的发病机制。为了提高经典 PCA 的可解释性,迄今为止已经提出了各种改进的 PCA 方法。其中,一种典型的方法是所谓的稀疏 PCA,它专注于寻找稀疏的加载。然而,由于这些方法使用无监督学习方法的限制,其性能仍然远不理想;此外,样本中的类别模糊度较高。为了解决这个问题,本文提出了一种新的 PCA 方法,称为有监督判别稀疏 PCA(SDSPCA)。该方法的主要创新在于将判别信息和稀疏性纳入 PCA 模型中。具体来说,与传统的对加载施加稀疏性的稀疏 PCA 不同,这里获得稀疏成分来表示数据。此外,通过线性变换,稀疏成分近似给定的标签信息。一方面,稀疏成分提高了传统 PCA 的可解释性,另一方面,它们具有适合分类目的的判别能力。开发了一种简单的算法,并提供了其收敛性证明。SDSPCA 已应用于多视图生物数据中的常见特征基因选择和肿瘤分类。通过丰富、合理和有效的实验验证了 SDSPCA 的稀疏性和分类性能,获得的结果表明 SDSPCA 优于其他最先进的方法。