高维错误设定二元分类中的预测与变量选择

Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification.

作者信息

Furmańczyk Konrad, Rejchel Wojciech

机构信息

Institute of Information Technology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-776 Warszawa, Poland.

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Chopina 12/18, 87-100 Toruń, Poland.

出版信息

Entropy (Basel). 2020 May 13;22(5):543. doi: 10.3390/e22050543.

DOI:10.3390/e22050543

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7517038/

Abstract

In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.

摘要

在本文中，我们考虑高维情形下误设二元分类模型中的预测和变量选择问题。我们聚焦于两种分类方法，它们计算效率高，但会导致模型误设。第一种方法是将惩罚逻辑回归应用于分类数据，而这些数据可能并不遵循逻辑模型。第二种方法更为激进：我们仅仅将对象的类别标签当作数字来处理，并应用惩罚线性回归。在本文中，我们深入研究这两种方法，并给出条件，以确保它们在预测和变量选择方面取得成功。即使预测变量的数量远大于样本量，我们的结果依然成立。本文最后给出了实验结果。

相似文献

1

Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification.高维错误设定二元分类中的预测与变量选择

Entropy (Basel). 2020 May 13;22(5):543. doi: 10.3390/e22050543.

2

Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors.基于套索法的误设高维二元模型及随机解释变量方法的选择一致性

Entropy (Basel). 2020 Jan 28;22(2):153. doi: 10.3390/e22020153.

3

Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications.通过绝对惩罚凸最小化进行估计与选择及其多阶段自适应应用

J Mach Learn Res. 2012 Jun 1;13:1839-1864.

4

On the robustness of the adaptive lasso to model misspecification.关于自适应套索对模型误设的稳健性。

Biometrika. 2012 Sep;99(3):717-731. doi: 10.1093/biomet/ass027. Epub 2012 Jul 11.

5

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.

6

Penalized joint generalized estimating equations for longitudinal binary data.纵向二元数据的惩罚联合广义估计方程

Biom J. 2022 Jan;64(1):57-73. doi: 10.1002/bimj.202000336. Epub 2021 Sep 29.

7

Penalized logistic regression with prior information for microarray gene expression classification.带有先验信息的惩罚逻辑回归用于微阵列基因表达分类

Int J Biostat. 2022 Nov 25;20(1):107-122. doi: 10.1515/ijb-2022-0025. eCollection 2024 May 1.

8

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes.用于识别具有重复测量二元结局的数据中预测变量交互作用的变量选择方法。

J Clin Transl Sci. 2020 Nov 16;5(1):e59. doi: 10.1017/cts.2020.556.

9

Robust learning for optimal treatment decision with NP-dimensionality.具有NP维数的最优治疗决策的稳健学习。

Electron J Stat. 2016;10:2894-2921. doi: 10.1214/16-EJS1178. Epub 2016 Oct 13.

10

Variable selection for binary spatial regression: Penalized quasi-likelihood approach.二元空间回归的变量选择：惩罚拟似然方法。

Biometrics. 2016 Dec;72(4):1164-1172. doi: 10.1111/biom.12525. Epub 2016 Apr 8.

引用本文的文献

1

Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods.非参数统计推断，重点在于信息论方法。

Entropy (Basel). 2022 Apr 15;24(4):553. doi: 10.3390/e24040553.

2

Patient No-Show Prediction: A Systematic Literature Review.患者爽约预测：一项系统文献综述。

Entropy (Basel). 2020 Jun 17;22(6):675. doi: 10.3390/e22060675.

本文引用的文献

1

Selection Consistency of Lasso-Based Procedures for Misspecified High-Dimensional Binary Model and Random Regressors.基于套索法的误设高维二元模型及随机解释变量方法的选择一致性

Entropy (Basel). 2020 Jan 28;22(2):153. doi: 10.3390/e22020153.

2

Regularized Quantile Regression and Robust Feature Screening for Single Index Models.单指标模型的正则化分位数回归与稳健特征筛选

Stat Sin. 2016 Jan;26(1):69-95. doi: 10.5705/ss.2014.049.

3

Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications.通过绝对惩罚凸最小化进行估计与选择及其多阶段自适应应用

J Mach Learn Res. 2012 Jun 1;13:1839-1864.

4

ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL.Cox模型中套索回归的Oracle不等式

Ann Stat. 2013 Jun 1;41(3):1142-1165. doi: 10.1214/13-AOS1098.

5

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

6

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验