Qi Zhengling, Liu Yufeng
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill.
Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill.
Technometrics. 2019;61(2):176-186. doi: 10.1080/00401706.2018.1497544. Epub 2018 Sep 12.
Classification problems are commonly seen in practice. In this paper, we aim to develop classifiers that can enjoy great interpretability as linear classifiers, and at the same time have model flexibility as nonlinear classifiers. We propose convex bidirectional large margin classifiers to fill the gap between linear and general nonlinear classifiers for high dimensional data. Our method provides a new data visualization tool for classification of high dimensional data. The obtained bilinear projection structure makes the proposed classifier very interpretable. Additional shrinkage to approximate variable selection is also considered. Through analysis of simulated and real data in high dimensional settings, our method is shown to have superior prediction performance and interpretability when there are potential subpopulations in the data. The computer code of the proposed method is available as supplemental materials.
分类问题在实际中很常见。在本文中,我们旨在开发出具有与线性分类器一样高可解释性,同时又具有与非线性分类器一样模型灵活性的分类器。我们提出凸双向大间隔分类器,以填补高维数据的线性分类器与一般非线性分类器之间的差距。我们的方法为高维数据分类提供了一种新的数据可视化工具。所得到的双线性投影结构使所提出的分类器具有很强的可解释性。还考虑了用于近似变量选择的额外收缩。通过对高维设置下的模拟数据和真实数据的分析,结果表明我们的方法在数据中存在潜在亚群时具有卓越的预测性能和可解释性。所提出方法的计算机代码可作为补充材料获取。