Zhi X, Liu J, Wu S, Niu C
School of Science, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China.
School of Communication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an, People's Republic of China.
J Appl Stat. 2021 Sep 17;50(3):703-723. doi: 10.1080/02664763.2021.1975662. eCollection 2023.
Feature selection is an important data dimension reduction method, and it has been used widely in applications involving high-dimensional data such as genetic data analysis and image processing. In order to achieve robust feature selection, the latest works apply the or -norm of matrix to the loss function and regularization terms in regression, and have achieved encouraging results. However, these existing works rigidly set the matrix norms used in the loss function and the regularization terms to the same or -norm, which limit their applications. In addition, the algorithms for solutions they present either have high computational complexity and are not suitable for large data sets, or cannot provide satisfying performance due to the approximate calculation. To address these problems, we present a generalized -norm regression based feature selection ( -RFS) method based on a new optimization criterion. The criterion extends the optimization criterion of ( -RFS) when the loss function and the regularization terms in regression use different matrix norms. We cast the new optimization criterion in a regression framework without regularization. In this framework, the new optimization criterion can be solved using an iterative re-weighted least squares (IRLS) procedure in which the least squares problem can be solved efficiently by using the least square QR decomposition (LSQR) algorithm. We have conducted extensive experiments to evaluate the proposed algorithm on various well-known data sets of both gene expression and image data sets, and compare it with other related feature selection methods.
特征选择是一种重要的数据降维方法,已广泛应用于涉及高维数据的应用中,如基因数据分析和图像处理。为了实现稳健的特征选择,最新的研究工作将矩阵的 或 -范数应用于回归中的损失函数和正则化项,并取得了令人鼓舞的成果。然而,这些现有工作将损失函数和正则化项中使用的矩阵范数严格设置为相同的 或 -范数,这限制了它们的应用。此外,他们提出的求解算法要么计算复杂度高,不适用于大数据集,要么由于近似计算而无法提供令人满意的性能。为了解决这些问题,我们基于一种新的优化准则提出了一种基于广义 -范数回归的特征选择( -RFS)方法。当回归中的损失函数和正则化项使用不同的矩阵范数时,该准则扩展了( -RFS)的优化准则。我们将新的优化准则置于无正则化的回归框架中。在这个框架中,新的优化准则可以使用迭代加权最小二乘法(IRLS)来求解,其中最小二乘问题可以通过使用最小二乘QR分解(LSQR)算法有效地求解。我们进行了广泛的实验,在各种著名的基因表达数据集和图像数据集上评估所提出的算法,并将其与其他相关的特征选择方法进行比较。