一种用于高维线性判别分析的高效贪婪搜索算法。

An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.

作者信息

Yang Hannan, Lin D Y, Li Quefeng

机构信息

Department of Biostatistics, University of North Carolina, Chapel Hill.

出版信息

Stat Sin. 2023 May;33(SI):1343-1364. doi: 10.5705/ss.202021.0028.

DOI:10.5705/ss.202021.0028

PMID:37455685

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10348717/

Abstract

High-dimensional classification is an important statistical problem that has applications in many areas. One widely used classifier is the Linear Discriminant Analysis (LDA). In recent years, many regularized LDA classifiers have been proposed to solve the problem of high-dimensional classification. However, these methods rely on inverting a large matrix or solving large-scale optimization problems to render classification rules-methods that are computationally prohibitive when the dimension is ultra-high. With the emergence of big data, it is increasingly important to develop more efficient algorithms to solve the high-dimensional LDA problem. In this paper, we propose an efficient greedy search algorithm that depends solely on closed-form formulae to learn a high-dimensional LDA rule. We establish theoretical guarantee of its statistical properties in terms of variable selection and error rate consistency; in addition, we provide an explicit interpretation of the extra information brought by an additional feature in a LDA problem under some mild distributional assumptions. We demonstrate that this new algorithm drastically improves computational speed compared with other high-dimensional LDA methods, while maintaining comparable or even better classification performance.

摘要

高维分类是一个重要的统计问题，在许多领域都有应用。一种广泛使用的分类器是线性判别分析（LDA）。近年来，人们提出了许多正则化的LDA分类器来解决高维分类问题。然而，这些方法依赖于对一个大矩阵求逆或求解大规模优化问题来得出分类规则——当维度超高时，这些方法在计算上是令人望而却步的。随着大数据的出现，开发更高效的算法来解决高维LDA问题变得越来越重要。在本文中，我们提出了一种高效的贪心搜索算法，该算法仅依赖于闭式公式来学习高维LDA规则。我们从变量选择和错误率一致性方面建立了其统计特性的理论保证；此外，在一些温和的分布假设下，我们对LDA问题中一个额外特征所带来的额外信息给出了明确的解释。我们证明，与其他高维LDA方法相比，这种新算法极大地提高了计算速度，同时保持了相当甚至更好的分类性能。

相似文献

An Efficient Greedy Search Algorithm for High-dimensional Linear Discriminant Analysis.一种用于高维线性判别分析的高效贪婪搜索算法。

Stat Sin. 2023 May;33(SI):1343-1364. doi: 10.5705/ss.202021.0028.

Graph-based sparse linear discriminant analysis for high-dimensional classification.基于图的稀疏线性判别分析用于高维分类

J Multivar Anal. 2019 May;171:250-269. doi: 10.1016/j.jmva.2018.12.007. Epub 2018 Dec 17.

Towards Robust Discriminative Projections Learning via Non-Greedy l-Norm MinMax.通过非贪婪 l -范数最小化最大化实现鲁棒判别投影学习

IEEE Trans Pattern Anal Mach Intell. 2021 Jun;43(6):2086-2100. doi: 10.1109/TPAMI.2019.2961877. Epub 2021 May 11.

Incremental Linear Discriminant Analysis: A Fast Algorithm and Comparisons.增量线性判别分析：一种快速算法及比较。

IEEE Trans Neural Netw Learn Syst. 2015 Nov;26(11):2716-35. doi: 10.1109/TNNLS.2015.2391201. Epub 2015 Jan 29.

A two-stage linear discriminant analysis via QR-decomposition.一种通过QR分解的两阶段线性判别分析。

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):929-41. doi: 10.1109/TPAMI.2005.110.

A novel hybrid linear/nonlinear classifier for two-class classification: theory, algorithm, and applications.一种新颖的混合线性/非线性分类器用于二类分类：理论、算法及应用。

IEEE Trans Med Imaging. 2010 Feb;29(2):428-41. doi: 10.1109/TMI.2009.2033596. Epub 2009 Oct 9.

Sparse Trace Ratio LDA for Supervised Feature Selection.用于监督特征选择的稀疏迹比线性判别分析

IEEE Trans Cybern. 2024 Apr;54(4):2420-2433. doi: 10.1109/TCYB.2023.3264907. Epub 2024 Mar 18.

Two-dimensional linear discriminant analysis for classification of three-way chemical data.用于三元化学数据分类的二维线性判别分析

Anal Chim Acta. 2016 Sep 28;938:53-62. doi: 10.1016/j.aca.2016.08.009. Epub 2016 Aug 20.

A Revised Formation of Trace Ratio LDA for Small Sample Size Problem.针对小样本量问题的痕量比率线性判别分析的修正形式

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5803-5809. doi: 10.1109/TNNLS.2024.3362512. Epub 2025 Feb 28.

Sparsifying the Fisher Linear Discriminant by Rotation.通过旋转使Fisher线性判别式稀疏化。

J R Stat Soc Series B Stat Methodol. 2015 Sep 1;77(4):827-851. doi: 10.1111/rssb.12092. Epub 2014 Nov 7.

本文引用的文献

Integrative linear discriminant analysis with guaranteed error rate improvement.具有保证错误率提升的集成线性判别分析。

Biometrika. 2018 Dec;105(4):917-930. doi: 10.1093/biomet/asy047. Epub 2018 Oct 22.

Robust estimation of high-dimensional covariance and precision matrices.高维协方差矩阵和精度矩阵的稳健估计。

Biometrika. 2018 Jun 1;105(2):271-284. doi: 10.1093/biomet/asy011. Epub 2018 Mar 27.

Ultrahigh-Dimensional Multiclass Linear Discriminant Analysis by Pairwise Sure Independence Screening.基于成对确定独立筛选的超高维多类线性判别分析

J Am Stat Assoc. 2016;111(513):169-179. doi: 10.1080/01621459.2014.998760. Epub 2016 May 5.

A ROAD to Classification in High Dimensional Space.通往高维空间分类之路

J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):745-771. doi: 10.1111/j.1467-9868.2012.01029.x. Epub 2012 Apr 12.

Penalized classification using Fisher's linear discriminant.使用费舍尔线性判别法的惩罚分类

J R Stat Soc Series B Stat Methodol. 2011 Nov;73(5):753-772. doi: 10.1111/j.1467-9868.2011.00783.x.

Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer.与乳腺癌分子亚型预后和化疗敏感性相关的基因通路。

J Natl Cancer Inst. 2011 Feb 2;103(3):264-72. doi: 10.1093/jnci/djq524. Epub 2010 Dec 29.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验