高维数据的概率分类器。

Probabilistic classifiers with high-dimensional data.

机构信息

Biometric Research Branch, National Cancer Institute, 9000 Rockville Pike, MSC 7434, Bethesda, MD 20892-7434, USA.

出版信息

Biostatistics. 2011 Jul;12(3):399-412. doi: 10.1093/biostatistics/kxq069. Epub 2010 Nov 17.

DOI:10.1093/biostatistics/kxq069

PMID:21087946

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3138069/

Abstract

For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not "anticonservative" using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set.

摘要

对于医学分类问题，通常希望为每个类别关联一个概率。尽管在医学决策中具有重要意义，但概率分类器在小 n 大 p 分类问题中并没有受到太多关注。在本文中，我们引入了两个用于评估概率分类器的标准：校准良好和细化，并开发了相应的评估指标。我们评估了几种已发表的高维概率分类器，并开发了贝叶斯复合协变量分类器的两种扩展。基于模拟研究和基因表达微阵列数据分析，我们发现正确的概率分类比确定性分类更困难。使用这里开发的方法确保概率分类器校准良好或至少不“保守”非常重要。我们为几种概率分类器提供了这种评估，并还评估了它们在弱信号和强信号条件下随样本量的细化情况。我们还提出了一种交叉验证方法，用于评估任何概率分类器在任何数据集上的校准和细化情况。

相似文献

Probabilistic classifiers with high-dimensional data.

Biostatistics. 2011 Jul;12(3):399-412. doi: 10.1093/biostatistics/kxq069. Epub 2010 Nov 17.

On the statistical assessment of classifiers using DNA microarray data.

BMC Bioinformatics. 2006 Aug 19;7:387. doi: 10.1186/1471-2105-7-387.

A combinational feature selection and ensemble neural network method for classification of gene expression data.

BMC Bioinformatics. 2004 Sep 27;5:136. doi: 10.1186/1471-2105-5-136.

Sample size planning for developing classifiers using high-dimensional DNA microarray data.

Biostatistics. 2007 Jan;8(1):101-17. doi: 10.1093/biostatistics/kxj036. Epub 2006 Apr 13.

Pattern classification with class probability output network.

IEEE Trans Neural Netw. 2009 Oct;20(10):1659-73. doi: 10.1109/TNN.2009.2029103. Epub 2009 Sep 18.

Corrected small-sample estimation of the Bayes error.

Bioinformatics. 2003 May 22;19(8):944-51. doi: 10.1093/bioinformatics/btg144.

A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays.

BMC Bioinformatics. 2006 Nov 24;7:514. doi: 10.1186/1471-2105-7-514.

Bayesian variable selection for disease classification using gene expression data.

Bioinformatics. 2010 Jan 15;26(2):215-22. doi: 10.1093/bioinformatics/btp638. Epub 2009 Nov 17.

A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors.

BMC Genomics. 2006 Jul 27;7:190. doi: 10.1186/1471-2164-7-190.

Multiclass cancer classification using gene expression profiling and probabilistic neural networks.

Pac Symp Biocomput. 2003:5-16.

引用本文的文献

A gene signature that distinguishes conventional and leukemic nonnodal mantle cell lymphoma helps predict outcome.

Blood. 2018 Jul 26;132(4):413-422. doi: 10.1182/blood-2018-03-838136. Epub 2018 May 16.

Assessing rejection-related disease in kidney transplant biopsies based on archetypal analysis of molecular phenotypes.

JCI Insight. 2017 Jun 15;2(12). doi: 10.1172/jci.insight.94197.

Transcriptome assists prognosis of disease severity in respiratory syncytial virus infected infants.

Sci Rep. 2016 Nov 11;6:36603. doi: 10.1038/srep36603.

Predicting Progression from Mild Cognitive Impairment to Alzheimer's Dementia Using Clinical, MRI, and Plasma Biomarkers via Probabilistic Pattern Classification.

PLoS One. 2016 Feb 22;11(2):e0138866. doi: 10.1371/journal.pone.0138866. eCollection 2016.

Factors affecting the accuracy of a class prediction model in gene expression data.

BMC Bioinformatics. 2015 Jun 21;16:199. doi: 10.1186/s12859-015-0610-4.

Optimally splitting cases for training and testing high dimensional classifiers.

BMC Med Genomics. 2011 Apr 8;4:31. doi: 10.1186/1755-8794-4-31.

本文引用的文献

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets.

BMC Bioinformatics. 2006 May 2;7:235. doi: 10.1186/1471-2105-7-235.

A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Stat Appl Genet Mol Biol. 2005;4:Article32. doi: 10.2202/1544-6115.1175. Epub 2005 Nov 14.

Prediction error estimation: a comparison of resampling methods.

Bioinformatics. 2005 Aug 1;21(15):3301-7. doi: 10.1093/bioinformatics/bti499. Epub 2005 May 19.

A protocol for building and evaluating predictors of disease state based on microarray data.

Bioinformatics. 2005 Oct 1;21(19):3755-62. doi: 10.1093/bioinformatics/bti429. Epub 2005 Apr 7.

BagBoosting for tumor classification with gene expression data.

Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.

A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma.

Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6. doi: 10.1073/pnas.1732008100. Epub 2003 Aug 4.

Diagnosis of multiple cancer types by shrunken centroids of gene expression.

Proc Natl Acad Sci U S A. 2002 May 14;99(10):6567-72. doi: 10.1073/pnas.082099299.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高维数据的概率分类器。

Probabilistic classifiers with high-dimensional data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献