高维小样本数据分类的一些考虑。

Some considerations of classification for high dimension low-sample size data.

机构信息

1Department of Statistics, Purdue University, West Lafayette, IN, USA.

出版信息

Stat Methods Med Res. 2013 Oct;22(5):537-50. doi: 10.1177/0962280211428387. Epub 2011 Nov 23.

Abstract

We review in this article several classification methods, especially for high-dimensional and low-sample size data. We discuss several desirable properties for classifiers in such settings, including predictability, consistency, generality, stability, robustness and sparsity. Specifically, a good classifier should have a small prediction error (predictability); converge to the Bayes-rule classifier asymptotically (consistency); be stable when adding/removing an observation (generality); be stable for different data sets of the same kind (stochastic stability); be stable when there are a small number of contaminated observations (robustness); and have a small number of variables in the classifier (interpretability or sparsity). Several simulation examples and real applications are used to illustrate the usefulness of the existing popular classifiers and compare their performance.

摘要

本文回顾了几种分类方法，特别是针对高维、小样本量数据的分类方法。我们讨论了此类情况下分类器的几个理想属性，包括可预测性、一致性、泛化性、稳定性、鲁棒性和稀疏性。具体来说，一个好的分类器应该具有较小的预测误差（可预测性）；渐近地收敛到贝叶斯规则分类器（一致性）；在添加/删除观测值时保持稳定（泛化性）；对于同一类的不同数据集保持稳定（随机稳定性）；在存在少量污染观测值时保持稳定（鲁棒性）；并且在分类器中具有较少的变量（可解释性或稀疏性）。本文使用了几个模拟示例和实际应用来说明现有的流行分类器的有用性，并比较了它们的性能。

相似文献

Some considerations of classification for high dimension low-sample size data.高维小样本数据分类的一些考虑。

Stat Methods Med Res. 2013 Oct;22(5):537-50. doi: 10.1177/0962280211428387. Epub 2011 Nov 23.

Comparative evaluation of classifiers in the presence of statistical interactions between features in high dimensional data settings.高维数据环境下特征间存在统计交互作用时分类器的比较评估

Int J Biostat. 2012 Jun 28;8(1):Article 17. doi: 10.1515/1557-4679.1373.

Optimal number of features as a function of sample size for various classification rules.针对各种分类规则，作为样本大小函数的最优特征数量。

Bioinformatics. 2005 Apr 15;21(8):1509-15. doi: 10.1093/bioinformatics/bti171. Epub 2004 Nov 30.

Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers.机器学习在前列腺癌病理分期中的应用：一系列分类器的性能比较。

Artif Intell Med. 2012 May;55(1):25-35. doi: 10.1016/j.artmed.2011.11.003. Epub 2011 Dec 27.

Boosting method for local learning in statistical pattern recognition.统计模式识别中局部学习的提升方法。

Neural Comput. 2008 Nov;20(11):2792-838. doi: 10.1162/neco.2008.06-07-549.

Class-imbalanced classifiers for high-dimensional data.高维数据的不平衡分类器。

Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9.

LESS: a model-based classifier for sparse subspaces.LESS：一种基于模型的稀疏子空间分类器。

IEEE Trans Pattern Anal Mach Intell. 2005 Sep;27(9):1496-500. doi: 10.1109/TPAMI.2005.182.

Quantile-based classifiers.基于分位数的分类器。

Biometrika. 2016 Jun;103(2):435-446. doi: 10.1093/biomet/asw015. Epub 2016 May 23.

A Bayesian approach to joint feature selection and classifier design.一种用于联合特征选择和分类器设计的贝叶斯方法。

IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1105-11. doi: 10.1109/TPAMI.2004.55.

Classifier ensembles for fMRI data analysis: an experiment.分类器集成在 fMRI 数据分析中的应用：一项实验。

Magn Reson Imaging. 2010 May;28(4):583-93. doi: 10.1016/j.mri.2009.12.021. Epub 2010 Jan 21.

引用本文的文献

A large-scale transcriptome-wide association study (TWAS) of 10 blood cell phenotypes reveals complexities of TWAS fine-mapping.一项针对 10 种血细胞表型的大规模转录组全基因组关联研究（TWAS）揭示了 TWAS 精细映射的复杂性。

Genet Epidemiol. 2022 Feb;46(1):3-16. doi: 10.1002/gepi.22436. Epub 2021 Nov 15.

Morphological, fractal, and textural features for the blood cell classification: the case of acute myeloid leukemia.血细胞形态、分形和纹理特征在急性髓细胞白血病分类中的应用。

Eur Biophys J. 2021 Dec;50(8):1111-1127. doi: 10.1007/s00249-021-01574-w. Epub 2021 Oct 12.

Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions.超高维稀疏多类别广义距离加权判别

Entropy (Basel). 2020 Nov 5;22(11):1257. doi: 10.3390/e22111257.

Machine learning approach yields epigenetic biomarkers of food allergy: A novel 13-gene signature to diagnose clinical reactivity.机器学习方法生成食物过敏的表观遗传生物标志物：一种用于诊断临床反应性的新的 13 基因特征。

PLoS One. 2019 Jun 19;14(6):e0218253. doi: 10.1371/journal.pone.0218253. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高维小样本数据分类的一些考虑。

Some considerations of classification for high dimension low-sample size data.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献