1Department of Statistics, Purdue University, West Lafayette, IN, USA.
Stat Methods Med Res. 2013 Oct;22(5):537-50. doi: 10.1177/0962280211428387. Epub 2011 Nov 23.
We review in this article several classification methods, especially for high-dimensional and low-sample size data. We discuss several desirable properties for classifiers in such settings, including predictability, consistency, generality, stability, robustness and sparsity. Specifically, a good classifier should have a small prediction error (predictability); converge to the Bayes-rule classifier asymptotically (consistency); be stable when adding/removing an observation (generality); be stable for different data sets of the same kind (stochastic stability); be stable when there are a small number of contaminated observations (robustness); and have a small number of variables in the classifier (interpretability or sparsity). Several simulation examples and real applications are used to illustrate the usefulness of the existing popular classifiers and compare their performance.
本文回顾了几种分类方法,特别是针对高维、小样本量数据的分类方法。我们讨论了此类情况下分类器的几个理想属性,包括可预测性、一致性、泛化性、稳定性、鲁棒性和稀疏性。具体来说,一个好的分类器应该具有较小的预测误差(可预测性);渐近地收敛到贝叶斯规则分类器(一致性);在添加/删除观测值时保持稳定(泛化性);对于同一类的不同数据集保持稳定(随机稳定性);在存在少量污染观测值时保持稳定(鲁棒性);并且在分类器中具有较少的变量(可解释性或稀疏性)。本文使用了几个模拟示例和实际应用来说明现有的流行分类器的有用性,并比较了它们的性能。