Suppr超能文献

逻辑回归和一般指数模型的稳健变量与交互作用选择

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.

作者信息

Li Yang, Liu Jun S

机构信息

Yang Li is Sr. Market Scientist, Vatic Labs LLC, New York, NY 10036. Jun S Liu is Professor, Department of Statistics, Harvard University, Cambridge, MA 02138; and is also co- Director for the Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.

出版信息

J Am Stat Assoc. 2019;114(525):271-286. doi: 10.1080/01621459.2017.1401541. Epub 2018 Jun 28.

Abstract

Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms to optimize the extended Bayesian Information Criterion (EBIC). Compared with existing methods for variable selection in quadratic discriminant analysis, SODA can deal with high-dimensional data in which the number of predictors is much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the Sliced Inverse Regression (SIR) (Li, 1991), SODA requires neither linearity nor constant variance condition and is thus more robust. Our theoretical analysis establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models.

摘要

在逻辑回归框架下,我们提出了一种前向-后向方法SODA,用于同时包含主效应项和二次交互效应项的变量选择。在前向阶段,SODA添加具有显著总体效应的预测变量,而在后向阶段,SODA去除不重要的项以优化扩展贝叶斯信息准则(EBIC)。与二次判别分析中现有的变量选择方法相比,SODA能够处理预测变量数量远大于样本量的高维数据,并且不需要对预测变量进行联合正态性假设,从而大大增强了稳健性。我们进一步扩展SODA以对一般指数模型进行变量选择和模型拟合。与基于切片逆回归(SIR)(Li,1991)的现有变量选择方法相比,SODA既不需要线性条件也不需要恒定方差条件,因此更加稳健。我们的理论分析确立了SODA在高维设置下的变量选择一致性,我们的模拟研究以及实际数据应用表明SODA在处理逻辑模型和一般指数模型中的非高斯设计矩阵方面具有卓越性能。

相似文献

2
Forward regression for Cox models with high-dimensional covariates.具有高维协变量的Cox模型的向前回归
J Multivar Anal. 2019 Sep;173:268-290. doi: 10.1016/j.jmva.2019.02.011. Epub 2019 Mar 5.
3
The cross-validated AUC for MCP-logistic regression with high-dimensional data.高维数据下 MCP-logistic 回归的交叉验证 AUC。
Stat Methods Med Res. 2013 Oct;22(5):505-18. doi: 10.1177/0962280211428385. Epub 2011 Nov 28.
6
Variable screening via quantile partial correlation.通过分位数偏相关进行变量筛选。
J Am Stat Assoc. 2017;112(518):650-663. doi: 10.1080/01621459.2016.1156545. Epub 2017 Mar 30.
9
Variable Selection via Partial Correlation.通过偏相关进行变量选择。
Stat Sin. 2017 Jul;27(3):983-996. doi: 10.5705/ss.202015.0473.

引用本文的文献

1
6
Unified model-free interaction screening via CV-entropy filter.通过CV熵滤波器进行统一的无模型相互作用筛选。
Comput Stat Data Anal. 2023 Apr;180. doi: 10.1016/j.csda.2022.107684. Epub 2022 Dec 28.
10
Robust genetic interaction analysis.稳健的遗传交互作用分析。
Brief Bioinform. 2019 Mar 25;20(2):624-637. doi: 10.1093/bib/bby033.

本文引用的文献

1
A LASSO FOR HIERARCHICAL INTERACTIONS.用于分层交互的套索法
Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.
2
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
4
CORRELATION PURSUIT: FORWARD STEPWISE VARIABLE SELECTION FOR INDEX MODELS.相关追踪:指数模型的向前逐步变量选择
J R Stat Soc Series B Stat Methodol. 2012 Nov 1;74(5):849-870. doi: 10.1111/j.1467-9868.2011.01026.x. Epub 2012 Apr 12.
5
Penalized classification using Fisher's linear discriminant.使用费舍尔线性判别法的惩罚分类
J R Stat Soc Series B Stat Methodol. 2011 Nov;73(5):753-772. doi: 10.1111/j.1467-9868.2011.00783.x.
7
Empirical Bayes Estimates for Large-Scale Prediction Problems.大规模预测问题的经验贝叶斯估计
J Am Stat Assoc. 2009 Sep 1;104(487):1015-1028. doi: 10.1198/jasa.2009.tm08523.
8
HIGH DIMENSIONAL VARIABLE SELECTION.高维变量选择
Ann Stat. 2009 Jan 1;37(5A):2178-2201. doi: 10.1214/08-aos646.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验