Suppr超能文献

无向图上的特征分组与选择

Feature Grouping and Selection Over an Undirected Graph.

作者信息

Yang Sen, Yuan Lei, Lai Ying-Cheng, Shen Xiaotong, Wonka Peter, Ye Jieping

机构信息

Computer Science and Engineering, Arizona State University, Tempe, AZ 85287, USA.

出版信息

KDD. 2012:922-930. doi: 10.1145/2339530.2339675.

Abstract

High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.

摘要

高维回归/分类仍然是一个重要且具有挑战性的问题,尤其是当特征高度相关时。特征选择与特征上的额外结构信息相结合,被认为在提升回归/分类性能方面很有前景。当特征呈现出特定的图结构时,最近有人提出了图引导融合套索(GFlasso)来促进特征选择和图结构利用。然而,GFlasso中的公式依赖于成对样本相关性来进行特征分组,这可能会引入额外的估计偏差。在本文中,我们提出了三种新的特征分组和选择方法来解决这个问题。第一种方法使用一个凸函数来惩罚相连回归/分类系数的成对范数,实现同时的特征分组和选择。第二种方法通过使用一个非凸函数来减少估计偏差改进了第一种方法。第三种方法是第二种方法的扩展,使用截断正则化来进一步减少估计偏差。所提出的方法将特征分组和特征选择相结合以提高估计精度。我们使用交替方向乘子法(ADMM)和凸函数差(DC)编程来求解所提出的公式。我们在合成数据和两个真实数据集上的实验结果证明了所提出方法的有效性。

相似文献

3
Simultaneous supervised clustering and feature selection over a graph.基于图的同时监督聚类与特征选择
Biometrika. 2012 Dec;99(4):899-914. doi: 10.1093/biomet/ass038. Epub 2012 Oct 18.
6
Stabilizing l1-norm prediction models by supervised feature grouping.通过监督特征分组来稳定l1范数预测模型。
J Biomed Inform. 2016 Feb;59:149-68. doi: 10.1016/j.jbi.2015.11.012. Epub 2015 Dec 9.
8
Multitask Feature Selection by Graph-Clustered Feature Sharing.基于图聚类特征共享的多任务特征选择。
IEEE Trans Cybern. 2020 Jan;50(1):74-86. doi: 10.1109/TCYB.2018.2864107. Epub 2018 Aug 23.
9
Sparse Regression Incorporating Graphical Structure among Predictors.结合预测变量间图形结构的稀疏回归
J Am Stat Assoc. 2016;111(514):707-720. doi: 10.1080/01621459.2015.1034319. Epub 2016 Aug 18.

引用本文的文献

1
On the Use of Minimum Penalties in Statistical Learning.关于统计学习中最小惩罚的使用
J Comput Graph Stat. 2024;33(1):138-151. doi: 10.1080/10618600.2023.2210174. Epub 2023 Jun 20.
4
8
Sparse Regression Incorporating Graphical Structure among Predictors.结合预测变量间图形结构的稀疏回归
J Am Stat Assoc. 2016;111(514):707-720. doi: 10.1080/01621459.2015.1034319. Epub 2016 Aug 18.

本文引用的文献

1
Efficient sparse modeling with automatic feature grouping.高效稀疏建模与自动特征分组。
IEEE Trans Neural Netw Learn Syst. 2012 Sep;23(9):1436-47. doi: 10.1109/TNNLS.2012.2200262.
3
Efficient methods for overlapping group lasso.重叠群组套索的有效方法。
IEEE Trans Pattern Anal Mach Intell. 2013 Sep;35(9):2104-16. doi: 10.1109/TPAMI.2013.17.
4
Simultaneous supervised clustering and feature selection over a graph.基于图的同时监督聚类与特征选择
Biometrika. 2012 Dec;99(4):899-914. doi: 10.1093/biomet/ass038. Epub 2012 Oct 18.
5
Grouping pursuit through a regularization solution surface.通过正则化解曲面进行分组追踪。
J Am Stat Assoc. 2010 Jun 1;105(490):727-739. doi: 10.1198/jasa.2010.tm09380.
8
Network-based classification of breast cancer metastasis.基于网络的乳腺癌转移分类
Mol Syst Biol. 2007;3:140. doi: 10.1038/msb4100180. Epub 2007 Oct 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验