改进最近收缩质心分类器的质心估计

Improved centroids estimation for the nearest shrunken centroid classifier.

作者信息

Wang Sijian, Zhu Ji

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Bioinformatics. 2007 Apr 15;23(8):972-9. doi: 10.1093/bioinformatics/btm046. Epub 2007 Mar 24.

DOI:10.1093/bioinformatics/btm046

PMID:17384429

Abstract

MOTIVATION

The nearest shrunken centroid (NSC) method has been successfully applied in many DNA-microarray classification problems. The NSC uses 'shrunken' centroids as prototypes for each class and identifies subsets of genes that best characterize each class. Classification is then made to the nearest (shrunken) centroid. The NSC is very easy to implement and very easy to interpret, however, it has drawbacks.

RESULTS

We show that the NSC method can be interpreted in the framework of LASSO regression. Based on that, we consider two new methods, adaptive L(infinity)-norm penalized NSC (ALP-NSC) and adaptive hierarchically penalized NSC (AHP-NSC), with two different penalty functions for microarray classification, which improve over the NSC. Unlike the L(1)-norm penalty used in LASSO, the penalty terms that we consider make use of the fact that parameters belonging to one gene should be treated as a natural group. Numerical results indicate that the two new methods tend to remove irrelevant genes more effectively and provide better classification results than the L(1)-norm approach.

AVAILABILITY

R code for the ALP-NSC and the AHP-NSC algorithms are available from authors upon request.

摘要

动机

最近收缩质心（NSC）方法已成功应用于许多DNA微阵列分类问题。NSC使用“收缩”质心作为每个类别的原型，并识别最能表征每个类别的基因子集。然后将样本分类到最近的（收缩）质心。NSC非常易于实现且易于解释，然而，它也存在缺点。

结果

我们表明NSC方法可以在LASSO回归框架中进行解释。基于此，我们考虑了两种新方法，自适应L（无穷）范数惩罚NSC（ALP - NSC）和自适应分层惩罚NSC（AHP - NSC），它们使用两种不同的惩罚函数进行微阵列分类，比NSC有所改进。与LASSO中使用的L（1）范数惩罚不同，我们考虑的惩罚项利用了属于一个基因的参数应被视为一个自然组这一事实。数值结果表明，这两种新方法比L（1）范数方法更倾向于有效地去除无关基因，并提供更好的分类结果。

可用性

可根据作者要求提供ALP - NSC和AHP - NSC算法的R代码。

相似文献

Improved centroids estimation for the nearest shrunken centroid classifier.

Bioinformatics. 2007 Apr 15;23(8):972-9. doi: 10.1093/bioinformatics/btm046. Epub 2007 Mar 24.

Differential gene expression detection and sample classification using penalized linear regression models.

Bioinformatics. 2006 Feb 15;22(4):472-6. doi: 10.1093/bioinformatics/bti827. Epub 2005 Dec 13.

Classification of microarrays to nearest centroids.

Bioinformatics. 2005 Nov 15;21(22):4148-54. doi: 10.1093/bioinformatics/bti681. Epub 2005 Sep 20.

Variable selection for model-based high-dimensional clustering and its application to microarray data.

Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

Hybrid huberized support vector machines for microarray classification and gene selection.

Bioinformatics. 2008 Feb 1;24(3):412-9. doi: 10.1093/bioinformatics/btm579. Epub 2008 Jan 5.

A multi-stage approach to clustering and imputation of gene expression profiles.

Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.

Independent component analysis-based penalized discriminant method for tumor classification using gene expression data.

Bioinformatics. 2006 Aug 1;22(15):1855-62. doi: 10.1093/bioinformatics/btl190. Epub 2006 May 18.

The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier.

IEEE Trans Pattern Anal Mach Intell. 2005 Sep;27(9):1417-29. doi: 10.1109/TPAMI.2005.187.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

引用本文的文献

Prostate cancer in young men represents a distinct clinical phenotype: gene expression signature to predict early metastases.

J Transl Genet Genom. 2021;5:50-61. doi: 10.20517/jtgg.2021.01. Epub 2021 Mar 9.

High-dimensional integrative copula discriminant analysis for multiomics data.

Stat Med. 2020 Dec 30;39(30):4869-4884. doi: 10.1002/sim.8758. Epub 2020 Oct 15.

A Cancer Biologist's Primer on Machine Learning Applications in High-Dimensional Cytometry.

Cytometry A. 2020 Aug;97(8):782-799. doi: 10.1002/cyto.a.24158. Epub 2020 Jun 30.

Nearest shrunken centroids via alternative genewise shrinkages.

PLoS One. 2017 Feb 15;12(2):e0171068. doi: 10.1371/journal.pone.0171068. eCollection 2017.

Covariance-enhanced discriminant analysis.

Biometrika. 2015;102(1):33-45. doi: 10.1093/biomet/asu049. Epub 2014 Dec 3.

Optimal Feature Selection in High-Dimensional Discriminant Analysis.

IEEE Trans Inf Theory. 2015 Feb;61(2):1063-1083. doi: 10.1109/TIT.2014.2381241.

Identification of significant features in DNA microarray data.

Wiley Interdiscip Rev Comput Stat. 2013 Jul;5(4). doi: 10.1002/wics.1260.

Block-diagonal discriminant analysis and its bias-corrected rules.

Stat Appl Genet Mol Biol. 2013 Jun;12(3):347-59. doi: 10.1515/sagmb-2012-0017.

Improved shrunken centroid classifiers for high-dimensional class-imbalanced data.

BMC Bioinformatics. 2013 Feb 23;14:64. doi: 10.1186/1471-2105-14-64.

A ROAD to Classification in High Dimensional Space.

J R Stat Soc Series B Stat Methodol. 2012 Sep;74(4):745-771. doi: 10.1111/j.1467-9868.2012.01029.x. Epub 2012 Apr 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

改进最近收缩质心分类器的质心估计

Improved centroids estimation for the nearest shrunken centroid classifier.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献