Suppr超能文献

使用连续优化的线性降维模型的最佳子集解路径

Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization.

作者信息

Liquet Benoit, Moka Sarat, Muller Samuel

机构信息

School of Mathematical and Physical Sciences, Macquarie University, Sydney, Australia.

Laboratoire de Mathématiques et de leurs Applications, Université de Pau et des Pays de l'Adour, Pau, France.

出版信息

Biom J. 2025 Feb;67(1):e70015. doi: 10.1002/bimj.70015.

Abstract

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.

摘要

在有监督和无监督学习中,选择最佳变量是一个具有挑战性的问题,特别是在高维环境中,变量的数量通常远大于观测值的数量。在本文中,我们专注于两种多元统计方法:主成分分析和偏最小二乘法。这两种方法都是流行的线性降维方法,在包括基因组学、生物学、环境科学和工程学在内的多个领域有大量应用。特别是,这些方法构建主成分,即由所有原始变量组合而成的新变量。主成分的一个主要缺点是当变量数量很大时难以解释它们。为了从最相关的变量中定义主成分,我们建议将最佳子集解路径方法应用于主成分分析和偏最小二乘框架。我们通过利用一种用于最佳子集解路径的连续优化算法提供了一种新的选择。实证研究表明我们的方法在提供最佳子集解路径方面的有效性。通过对两个真实数据集的分析,进一步展示了我们算法的用法。第一个数据集使用主成分分析进行分析,而第二个数据集的分析基于偏最小二乘框架。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验