• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

逻辑回归和一般指数模型的稳健变量与交互作用选择

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.

作者信息

Li Yang, Liu Jun S

机构信息

Yang Li is Sr. Market Scientist, Vatic Labs LLC, New York, NY 10036. Jun S Liu is Professor, Department of Statistics, Harvard University, Cambridge, MA 02138; and is also co- Director for the Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.

出版信息

J Am Stat Assoc. 2019;114(525):271-286. doi: 10.1080/01621459.2017.1401541. Epub 2018 Jun 28.

DOI:10.1080/01621459.2017.1401541
PMID:32863479
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7451675/
Abstract

Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms to optimize the extended Bayesian Information Criterion (EBIC). Compared with existing methods for variable selection in quadratic discriminant analysis, SODA can deal with high-dimensional data in which the number of predictors is much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the Sliced Inverse Regression (SIR) (Li, 1991), SODA requires neither linearity nor constant variance condition and is thus more robust. Our theoretical analysis establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models.

摘要

在逻辑回归框架下,我们提出了一种前向-后向方法SODA,用于同时包含主效应项和二次交互效应项的变量选择。在前向阶段,SODA添加具有显著总体效应的预测变量,而在后向阶段,SODA去除不重要的项以优化扩展贝叶斯信息准则(EBIC)。与二次判别分析中现有的变量选择方法相比,SODA能够处理预测变量数量远大于样本量的高维数据,并且不需要对预测变量进行联合正态性假设,从而大大增强了稳健性。我们进一步扩展SODA以对一般指数模型进行变量选择和模型拟合。与基于切片逆回归(SIR)(Li,1991)的现有变量选择方法相比,SODA既不需要线性条件也不需要恒定方差条件,因此更加稳健。我们的理论分析确立了SODA在高维设置下的变量选择一致性,我们的模拟研究以及实际数据应用表明SODA在处理逻辑模型和一般指数模型中的非高斯设计矩阵方面具有卓越性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/f6b5b54361ef/nihms-1501132-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/b6dc4e02d197/nihms-1501132-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/6891416dffec/nihms-1501132-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/3c213b5e9aae/nihms-1501132-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/101c76be3e75/nihms-1501132-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/74626b306e29/nihms-1501132-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/de6658a6cbbb/nihms-1501132-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/f6b5b54361ef/nihms-1501132-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/b6dc4e02d197/nihms-1501132-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/6891416dffec/nihms-1501132-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/3c213b5e9aae/nihms-1501132-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/101c76be3e75/nihms-1501132-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/74626b306e29/nihms-1501132-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/de6658a6cbbb/nihms-1501132-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e8f/7451675/f6b5b54361ef/nihms-1501132-f0007.jpg

相似文献

1
Robust Variable and Interaction Selection for Logistic Regression and General Index Models.逻辑回归和一般指数模型的稳健变量与交互作用选择
J Am Stat Assoc. 2019;114(525):271-286. doi: 10.1080/01621459.2017.1401541. Epub 2018 Jun 28.
2
Forward regression for Cox models with high-dimensional covariates.具有高维协变量的Cox模型的向前回归
J Multivar Anal. 2019 Sep;173:268-290. doi: 10.1016/j.jmva.2019.02.011. Epub 2019 Mar 5.
3
The cross-validated AUC for MCP-logistic regression with high-dimensional data.高维数据下 MCP-logistic 回归的交叉验证 AUC。
Stat Methods Med Res. 2013 Oct;22(5):505-18. doi: 10.1177/0962280211428385. Epub 2011 Nov 28.
4
Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors.基于套索法的误设高维二元模型及随机解释变量方法的选择一致性
Entropy (Basel). 2020 Jan 28;22(2):153. doi: 10.3390/e22020153.
5
Sparse sliced inverse regression for high dimensional data analysis.用于高维数据分析的稀疏切片逆回归
BMC Bioinformatics. 2022 May 7;23(1):168. doi: 10.1186/s12859-022-04700-3.
6
Variable screening via quantile partial correlation.通过分位数偏相关进行变量筛选。
J Am Stat Assoc. 2017;112(518):650-663. doi: 10.1080/01621459.2016.1156545. Epub 2017 Mar 30.
7
Fast forward selection for generalized estimating equations with a large number of predictor variables.具有大量预测变量的广义估计方程的快速向前选择。
Biometrics. 2014 Mar;70(1):110-20. doi: 10.1111/biom.12118. Epub 2013 Dec 18.
8
An empirical approach to model selection through validation for censored survival data.基于验证的删失生存数据分析中模型选择的经验方法。
J Biomed Inform. 2011 Aug;44(4):595-606. doi: 10.1016/j.jbi.2011.02.005. Epub 2011 Feb 16.
9
Variable Selection via Partial Correlation.通过偏相关进行变量选择。
Stat Sin. 2017 Jul;27(3):983-996. doi: 10.5705/ss.202015.0473.
10
Joint Bayesian variable and graph selection for regression models with network-structured predictors.具有网络结构预测变量的回归模型的联合贝叶斯变量与图选择
Stat Med. 2016 Mar 30;35(7):1017-31. doi: 10.1002/sim.6792. Epub 2015 Oct 29.

引用本文的文献

1
Epidemiology of Overweight and Obesity in Early Childhood in China and Associated Factors.中国幼儿超重与肥胖的流行病学及相关因素
Diabetes Metab Syndr Obes. 2025 May 29;18:1809-1822. doi: 10.2147/DMSO.S493135. eCollection 2025.
2
Think and Choose! The Dual Impact of Label Information and Consumer Attitudes on the Choice of a Plant-Based Analog.思考与选择!标签信息和消费者态度对植物基替代品选择的双重影响。
Foods. 2024 Jul 18;13(14):2269. doi: 10.3390/foods13142269.
3
Partitioning and aggregating cross-tissue and tissue-specific genetic effects to identify gene-trait associations.

本文引用的文献

1
A LASSO FOR HIERARCHICAL INTERACTIONS.用于分层交互的套索法
Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.
2
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
3
Evaluation of the lasso and the elastic net in genome-wide association studies.全基因组关联研究中lasso 和弹性网络的评估。
分块和聚合跨组织和组织特异性遗传效应以识别基因-性状关联。
Nat Commun. 2024 Jul 9;15(1):5769. doi: 10.1038/s41467-024-49924-4.
4
A Bayesian approach to differential edges with probabilistic interactions: applications in association and classification.一种具有概率性相互作用的微分边的贝叶斯方法:在关联和分类中的应用。
Bioinform Adv. 2023 Nov 24;3(1):vbad172. doi: 10.1093/bioadv/vbad172. eCollection 2023.
5
The Kendall interaction filter for variable interaction screening in high dimensional classification problems.用于高维分类问题中变量交互筛选的肯德尔交互过滤器。
J Appl Stat. 2022 Feb 4;50(7):1496-1514. doi: 10.1080/02664763.2022.2031125. eCollection 2023.
6
Unified model-free interaction screening via CV-entropy filter.通过CV熵滤波器进行统一的无模型相互作用筛选。
Comput Stat Data Anal. 2023 Apr;180. doi: 10.1016/j.csda.2022.107684. Epub 2022 Dec 28.
7
An easy-to-use nomogram predicting overall survival of adult acute lymphoblastic leukemia.一种预测成人急性淋巴细胞白血病总生存期的易用列线图。
Front Oncol. 2022 Sep 26;12:977119. doi: 10.3389/fonc.2022.977119. eCollection 2022.
8
High Survivorship of First-Generation Monarch Butterfly Eggs to Third Instar Associated with a Diverse Arthropod Community.第一代帝王蝶卵到三龄幼虫的高存活率与多样化的节肢动物群落有关。
Insects. 2021 Jun 21;12(6):567. doi: 10.3390/insects12060567.
9
IMMIGRATE: A Margin-Based Feature Selection Method with Interaction Terms.IMMIGRATE:一种带有交互项的基于边际的特征选择方法。
Entropy (Basel). 2020 Mar 2;22(3):291. doi: 10.3390/e22030291.
10
Robust genetic interaction analysis.稳健的遗传交互作用分析。
Brief Bioinform. 2019 Mar 25;20(2):624-637. doi: 10.1093/bib/bby033.
Front Genet. 2013 Dec 4;4:270. doi: 10.3389/fgene.2013.00270. eCollection 2013.
4
CORRELATION PURSUIT: FORWARD STEPWISE VARIABLE SELECTION FOR INDEX MODELS.相关追踪:指数模型的向前逐步变量选择
J R Stat Soc Series B Stat Methodol. 2012 Nov 1;74(5):849-870. doi: 10.1111/j.1467-9868.2011.01026.x. Epub 2012 Apr 12.
5
Penalized classification using Fisher's linear discriminant.使用费舍尔线性判别法的惩罚分类
J R Stat Soc Series B Stat Methodol. 2011 Nov;73(5):753-772. doi: 10.1111/j.1467-9868.2011.00783.x.
6
Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications.基于模型的判别分析中高维数据的变量选择与更新及其在食品真实性应用中的研究
Ann Appl Stat. 2010 Mar 1;4(1):396-421. doi: 10.1214/09-AOAS279.
7
Empirical Bayes Estimates for Large-Scale Prediction Problems.大规模预测问题的经验贝叶斯估计
J Am Stat Assoc. 2009 Sep 1;104(487):1015-1028. doi: 10.1198/jasa.2009.tm08523.
8
HIGH DIMENSIONAL VARIABLE SELECTION.高维变量选择
Ann Stat. 2009 Jan 1;37(5A):2178-2201. doi: 10.1214/08-aos646.
9
Bayesian inference of protein-protein interactions from biological literature.基于生物文献的蛋白质-蛋白质相互作用的贝叶斯推断
Bioinformatics. 2009 Jun 15;25(12):1536-42. doi: 10.1093/bioinformatics/btp245. Epub 2009 Apr 15.
10
High Dimensional Classification Using Features Annealed Independence Rules.使用特征退火独立规则的高维分类
Ann Stat. 2008;36(6):2605-2637. doi: 10.1214/07-AOS504.