使用连续优化的线性降维模型的最佳子集解路径

Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization.

作者信息

Liquet Benoit, Moka Sarat, Muller Samuel

机构信息

School of Mathematical and Physical Sciences, Macquarie University, Sydney, Australia.

Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l'Adour, Pau, France.

出版信息

Biom J. 2025 Feb;67(1):e70015. doi: 10.1002/bimj.70015.

DOI:10.1002/bimj.70015

PMID:39686707

Abstract

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.

摘要

在有监督和无监督学习中，选择最佳变量是一个具有挑战性的问题，特别是在高维环境中，变量的数量通常远大于观测值的数量。在本文中，我们专注于两种多元统计方法：主成分分析和偏最小二乘法。这两种方法都是流行的线性降维方法，在包括基因组学、生物学、环境科学和工程学在内的多个领域有大量应用。特别是，这些方法构建主成分，即由所有原始变量组合而成的新变量。主成分的一个主要缺点是当变量数量很大时难以解释它们。为了从最相关的变量中定义主成分，我们建议将最佳子集解路径方法应用于主成分分析和偏最小二乘框架。我们通过利用一种用于最佳子集解路径的连续优化算法提供了一种新的选择。实证研究表明我们的方法在提供最佳子集解路径方面的有效性。通过对两个真实数据集的分析，进一步展示了我们算法的用法。第一个数据集使用主成分分析进行分析，而第二个数据集的分析基于偏最小二乘框架。

相似文献

Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization.使用连续优化的线性降维模型的最佳子集解路径

Biom J. 2025 Feb;67(1):e70015. doi: 10.1002/bimj.70015.

Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins.降维与变量选择在基因组选择中的应用：以荷斯坦奶牛产奶量预测为例

J Anim Breed Genet. 2011 Aug;128(4):247-57. doi: 10.1111/j.1439-0388.2011.00917.x. Epub 2011 Mar 28.

Identifying critical variables of principal components for unsupervised feature selection.识别用于无监督特征选择的主成分的关键变量。

IEEE Trans Syst Man Cybern B Cybern. 2005 Apr;35(2):339-44. doi: 10.1109/tsmcb.2004.843269.

Methods of selecting informative variables.选择信息变量的方法。

Biom J. 2006 Feb;48(1):157-73. doi: 10.1002/bimj.200410146.

Using the right tool for the job: the difference between unsupervised and supervised analyses of multivariate ecological data.因“事”制宜：多元生态数据分析中无监督分析与有监督分析的差异。

Oecologia. 2021 May;196(1):13-25. doi: 10.1007/s00442-020-04848-w. Epub 2021 Feb 12.

Global Least Squares Path Modeling: A Full-Information Alternative to Partial Least Squares Path Modeling.全局最小二乘法路径建模：偏最小二乘法路径建模的全信息替代方法。

Psychometrika. 2020 Dec;85(4):947-972. doi: 10.1007/s11336-020-09733-2. Epub 2020 Dec 21.

Sliced inverse regression with regularizations.带正则化的切片逆回归

Biometrics. 2008 Mar;64(1):124-31. doi: 10.1111/j.1541-0420.2007.00836.x. Epub 2007 Jul 25.

A non-linear data mining parameter selection algorithm for continuous variables.一种用于连续变量的非线性数据挖掘参数选择算法。

PLoS One. 2017 Nov 13;12(11):e0187676. doi: 10.1371/journal.pone.0187676. eCollection 2017.

Dimension selection for feature selection and dimension reduction with principal and independent component analysis.用于特征选择和降维的维度选择，采用主成分分析和独立成分分析。

Neural Comput. 2007 Feb;19(2):513-45. doi: 10.1162/neco.2007.19.2.513.

Dimension reduction for high-dimensional data.高维数据的降维

Methods Mol Biol. 2010;620:417-34. doi: 10.1007/978-1-60761-580-4_14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用连续优化的线性降维模型的最佳子集解路径

Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献