基于图的稀疏线性判别分析用于高维分类

Graph-based sparse linear discriminant analysis for high-dimensional classification.

作者信息

Liu Jianyu, Yu Guan, Liu Yufeng

机构信息

Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA.

Department of Biostatistics, University at Buffalo, Buffalo, NY 14214, USA.

出版信息

J Multivar Anal. 2019 May;171:250-269. doi: 10.1016/j.jmva.2018.12.007. Epub 2018 Dec 17.

DOI:10.1016/j.jmva.2018.12.007

PMID:31983784

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6980367/

Abstract

Linear discriminant analysis (LDA) is a well-known classification technique that enjoyed great success in practical applications. Despite its effectiveness for traditional low-dimensional problems, extensions of LDA are necessary in order to classify high-dimensional data. Many variants of LDA have been proposed in the literature. However, most of these methods do not fully incorporate the structure information among predictors when such information is available. In this paper, we introduce a new high-dimensional LDA technique, namely graph-based sparse LDA (GSLDA), that utilizes the graph structure among the features. In particular, we use the regularized regression formulation for penalized LDA techniques, and propose to impose a structure-based sparse penalty on the discriminant vector . The graph structure can be either given or estimated from the training data. Moreover, we explore the relationship between the within-class feature structure and the overall feature structure. Based on this relationship, we further propose a variant of our proposed GSLDA to utilize effectively unlabeled data, which can be abundant in the semi-supervised learning setting. With the new regularization, we can obtain a sparse estimate of and more accurate and interpretable classifiers than many existing methods. Both the selection consistency of estimation and the convergence rate of the classifier are established, and the resulting classifier has an asymptotic Bayes error rate. Finally, we demonstrate the competitive performance of the proposed GSLDA on both simulated and real data studies.

摘要

线性判别分析（LDA）是一种著名的分类技术，在实际应用中取得了巨大成功。尽管它在处理传统低维问题时很有效，但为了对高维数据进行分类，LDA的扩展是必要的。文献中已经提出了许多LDA的变体。然而，当预测变量之间存在结构信息时，这些方法中的大多数并没有充分纳入该信息。在本文中，我们介绍了一种新的高维LDA技术，即基于图的稀疏LDA（GSLDA），它利用了特征之间的图结构。具体来说，我们将正则化回归公式用于惩罚LDA技术，并建议对判别向量施加基于结构的稀疏惩罚。图结构既可以是给定的，也可以从训练数据中估计得到。此外，我们还探讨了类内特征结构与整体特征结构之间的关系。基于这种关系，我们进一步提出了一种GSLDA的变体，以有效利用未标记数据，在半监督学习环境中，未标记数据可能很丰富。通过新的正则化，我们可以得到判别向量的稀疏估计，并且比许多现有方法得到更准确且可解释的分类器。我们建立了判别向量估计的选择一致性和分类器的收敛速度，并且所得到的分类器具有渐近贝叶斯错误率。最后，我们在模拟数据和真实数据研究中展示了所提出的GSLDA的竞争性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ab/6980367/4466e53f572c/nihms-1518090-f0003.jpg

相似文献

Graph-based sparse linear discriminant analysis for high-dimensional classification.基于图的稀疏线性判别分析用于高维分类

J Multivar Anal. 2019 May;171:250-269. doi: 10.1016/j.jmva.2018.12.007. Epub 2018 Dec 17.

Sparse Regression Incorporating Graphical Structure among Predictors.结合预测变量间图形结构的稀疏回归

J Am Stat Assoc. 2016;111(514):707-720. doi: 10.1080/01621459.2015.1034319. Epub 2016 Aug 18.

Sparse Trace Ratio LDA for Supervised Feature Selection.用于监督特征选择的稀疏迹比线性判别分析

IEEE Trans Cybern. 2024 Apr;54(4):2420-2433. doi: 10.1109/TCYB.2023.3264907. Epub 2024 Mar 18.

An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.一种用于高维线性判别分析的高效贪婪搜索算法。

Stat Sin. 2023 May;33(SI):1343-1364. doi: 10.5705/ss.202021.0028.

Learning a discriminant graph-based embedding with feature selection for image categorization.基于判别图嵌入和特征选择的图像分类方法研究。

Neural Netw. 2019 Mar;111:35-46. doi: 10.1016/j.neunet.2018.12.008. Epub 2018 Dec 27.

Semi-supervised bilinear subspace learning.半监督双线性子空间学习

IEEE Trans Image Process. 2009 Jul;18(7):1671-6. doi: 10.1109/TIP.2009.2018015. Epub 2009 May 12.

Sparse ordinal discriminant analysis.稀疏有序判别分析。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad040.

Sparsifying the Fisher Linear Discriminant by Rotation.通过旋转使Fisher线性判别式稀疏化。

J R Stat Soc Series B Stat Methodol. 2015 Sep 1;77(4):827-851. doi: 10.1111/rssb.12092. Epub 2014 Nov 7.

DBSDA : Lowering the Bound of Misclassification Rate for Sparse Linear Discriminant Analysis via Model Debiasing.DBSDA：通过模型去偏置降低稀疏线性判别分析的误分类率界限

IEEE Trans Neural Netw Learn Syst. 2019 Mar;30(3):707-717. doi: 10.1109/TNNLS.2018.2846783. Epub 2018 Jul 24.

Generalized discriminant analysis: a matrix exponential approach.广义判别分析：一种矩阵指数方法。

IEEE Trans Syst Man Cybern B Cybern. 2010 Feb;40(1):186-97. doi: 10.1109/TSMCB.2009.2024759. Epub 2009 Jul 31.

引用本文的文献

Identification of resistance in Escherichia coli and Klebsiella pneumoniae using excitation-emission matrix fluorescence spectroscopy and multivariate analysis.采用激发-发射矩阵荧光光谱法和多元分析鉴定大肠埃希菌和肺炎克雷伯菌的耐药性。

Sci Rep. 2020 Aug 3;10(1):12994. doi: 10.1038/s41598-020-70033-x.

本文引用的文献

Sparse Regression Incorporating Graphical Structure among Predictors.结合预测变量间图形结构的稀疏回归

J Am Stat Assoc. 2016;111(514):707-720. doi: 10.1080/01621459.2015.1034319. Epub 2016 Aug 18.

Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery.用于临床风险预测和生物标志物发现的基于网络正则化稀疏逻辑回归模型。

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):944-953. doi: 10.1109/TCBB.2016.2640303. Epub 2016 Dec 15.

Reinforced Angle-based Multicategory Support Vector Machines.基于增强角度的多类别支持向量机

J Comput Graph Stat. 2016;25(3):806-825. doi: 10.1080/10618600.2015.1043010. Epub 2016 Aug 5.

Selection and estimation for mixed graphical models.混合图形模型的选择与估计

Biometrika. 2015 Mar;102(1):47-64. doi: 10.1093/biomet/asu051. Epub 2014 Dec 24.

A significance test for graph-constrained estimation.一种用于图形约束估计的显著性检验。

Biometrics. 2016 Jun;72(2):484-93. doi: 10.1111/biom.12418. Epub 2015 Sep 22.

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R.用于R语言中线性规划和大规模精度矩阵估计的fastclime软件包。

J Mach Learn Res. 2014 Feb;15:489-493.

Graph Estimation with Joint Additive Models.基于联合加法模型的图估计

Biometrika. 2014 Mar 1;101(1):85-101. doi: 10.1093/biomet/ast053.

Molecular pathway identification using biological network-regularized logistic models.基于生物网络正则化逻辑模型的分子通路识别。

BMC Genomics. 2013;14 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-14-S8-S7. Epub 2013 Dec 9.

Multicategory Large-Margin Unified Machines.多类别大间隔统一机器

J Mach Learn Res. 2013 May 1;14:1349-1386.

Semi-supervised spectral clustering with application to detect population stratification.半监督谱聚类及其在群体分层检测中的应用

Front Genet. 2013 Oct 25;4:215. doi: 10.3389/fgene.2013.00215. eCollection 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验