Furmańczyk Konrad, Rejchel Wojciech
Institute of Information Technology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-776 Warszawa, Poland.
Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Chopina 12/18, 87-100 Toruń, Poland.
Entropy (Basel). 2020 May 13;22(5):543. doi: 10.3390/e22050543.
In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.
在本文中,我们考虑高维情形下误设二元分类模型中的预测和变量选择问题。我们聚焦于两种分类方法,它们计算效率高,但会导致模型误设。第一种方法是将惩罚逻辑回归应用于分类数据,而这些数据可能并不遵循逻辑模型。第二种方法更为激进:我们仅仅将对象的类别标签当作数字来处理,并应用惩罚线性回归。在本文中,我们深入研究这两种方法,并给出条件,以确保它们在预测和变量选择方面取得成功。即使预测变量的数量远大于样本量,我们的结果依然成立。本文最后给出了实验结果。