Suppr超能文献

高维数据的概率分类器。

Probabilistic classifiers with high-dimensional data.

机构信息

Biometric Research Branch, National Cancer Institute, 9000 Rockville Pike, MSC 7434, Bethesda, MD 20892-7434, USA.

出版信息

Biostatistics. 2011 Jul;12(3):399-412. doi: 10.1093/biostatistics/kxq069. Epub 2010 Nov 17.

Abstract

For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not "anticonservative" using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set.

摘要

对于医学分类问题,通常希望为每个类别关联一个概率。尽管在医学决策中具有重要意义,但概率分类器在小 n 大 p 分类问题中并没有受到太多关注。在本文中,我们引入了两个用于评估概率分类器的标准:校准良好和细化,并开发了相应的评估指标。我们评估了几种已发表的高维概率分类器,并开发了贝叶斯复合协变量分类器的两种扩展。基于模拟研究和基因表达微阵列数据分析,我们发现正确的概率分类比确定性分类更困难。使用这里开发的方法确保概率分类器校准良好或至少不“保守”非常重要。我们为几种概率分类器提供了这种评估,并还评估了它们在弱信号和强信号条件下随样本量的细化情况。我们还提出了一种交叉验证方法,用于评估任何概率分类器在任何数据集上的校准和细化情况。

相似文献

1
Probabilistic classifiers with high-dimensional data.
Biostatistics. 2011 Jul;12(3):399-412. doi: 10.1093/biostatistics/kxq069. Epub 2010 Nov 17.
2
On the statistical assessment of classifiers using DNA microarray data.
BMC Bioinformatics. 2006 Aug 19;7:387. doi: 10.1186/1471-2105-7-387.
4
Sample size planning for developing classifiers using high-dimensional DNA microarray data.
Biostatistics. 2007 Jan;8(1):101-17. doi: 10.1093/biostatistics/kxj036. Epub 2006 Apr 13.
5
Pattern classification with class probability output network.
IEEE Trans Neural Netw. 2009 Oct;20(10):1659-73. doi: 10.1109/TNN.2009.2029103. Epub 2009 Sep 18.
6
Corrected small-sample estimation of the Bayes error.
Bioinformatics. 2003 May 22;19(8):944-51. doi: 10.1093/bioinformatics/btg144.
8
Bayesian variable selection for disease classification using gene expression data.
Bioinformatics. 2010 Jan 15;26(2):215-22. doi: 10.1093/bioinformatics/btp638. Epub 2009 Nov 17.

引用本文的文献

本文引用的文献

3
A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.
Stat Appl Genet Mol Biol. 2005;4:Article32. doi: 10.2202/1544-6115.1175. Epub 2005 Nov 14.
4
Prediction error estimation: a comparison of resampling methods.
Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.
5
A protocol for building and evaluating predictors of disease state based on microarray data.
Bioinformatics. 2005 Oct 1;21(19):3755-62. doi: 10.1093/bioinformatics/bti429. Epub 2005 Apr 7.
6
BagBoosting for tumor classification with gene expression data.
Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.
7
A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma.
Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6. doi: 10.1073/pnas.1732008100. Epub 2003 Aug 4.
8
Diagnosis of multiple cancer types by shrunken centroids of gene expression.
Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72. doi: 10.1073/pnas.082099299.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验