Suppr超能文献

高维小样本数据分类的一些考虑。

Some considerations of classification for high dimension low-sample size data.

机构信息

1Department of Statistics, Purdue University, West Lafayette, IN, USA.

出版信息

Stat Methods Med Res. 2013 Oct;22(5):537-50. doi: 10.1177/0962280211428387. Epub 2011 Nov 23.

Abstract

We review in this article several classification methods, especially for high-dimensional and low-sample size data. We discuss several desirable properties for classifiers in such settings, including predictability, consistency, generality, stability, robustness and sparsity. Specifically, a good classifier should have a small prediction error (predictability); converge to the Bayes-rule classifier asymptotically (consistency); be stable when adding/removing an observation (generality); be stable for different data sets of the same kind (stochastic stability); be stable when there are a small number of contaminated observations (robustness); and have a small number of variables in the classifier (interpretability or sparsity). Several simulation examples and real applications are used to illustrate the usefulness of the existing popular classifiers and compare their performance.

摘要

本文回顾了几种分类方法,特别是针对高维、小样本量数据的分类方法。我们讨论了此类情况下分类器的几个理想属性,包括可预测性、一致性、泛化性、稳定性、鲁棒性和稀疏性。具体来说,一个好的分类器应该具有较小的预测误差(可预测性);渐近地收敛到贝叶斯规则分类器(一致性);在添加/删除观测值时保持稳定(泛化性);对于同一类的不同数据集保持稳定(随机稳定性);在存在少量污染观测值时保持稳定(鲁棒性);并且在分类器中具有较少的变量(可解释性或稀疏性)。本文使用了几个模拟示例和实际应用来说明现有的流行分类器的有用性,并比较了它们的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验