Wang Sijian, Zhu Ji
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2007 Apr 15;23(8):972-9. doi: 10.1093/bioinformatics/btm046. Epub 2007 Mar 24.
The nearest shrunken centroid (NSC) method has been successfully applied in many DNA-microarray classification problems. The NSC uses 'shrunken' centroids as prototypes for each class and identifies subsets of genes that best characterize each class. Classification is then made to the nearest (shrunken) centroid. The NSC is very easy to implement and very easy to interpret, however, it has drawbacks.
We show that the NSC method can be interpreted in the framework of LASSO regression. Based on that, we consider two new methods, adaptive L(infinity)-norm penalized NSC (ALP-NSC) and adaptive hierarchically penalized NSC (AHP-NSC), with two different penalty functions for microarray classification, which improve over the NSC. Unlike the L(1)-norm penalty used in LASSO, the penalty terms that we consider make use of the fact that parameters belonging to one gene should be treated as a natural group. Numerical results indicate that the two new methods tend to remove irrelevant genes more effectively and provide better classification results than the L(1)-norm approach.
R code for the ALP-NSC and the AHP-NSC algorithms are available from authors upon request.
最近收缩质心(NSC)方法已成功应用于许多DNA微阵列分类问题。NSC使用“收缩”质心作为每个类别的原型,并识别最能表征每个类别的基因子集。然后将样本分类到最近的(收缩)质心。NSC非常易于实现且易于解释,然而,它也存在缺点。
我们表明NSC方法可以在LASSO回归框架中进行解释。基于此,我们考虑了两种新方法,自适应L(无穷)范数惩罚NSC(ALP - NSC)和自适应分层惩罚NSC(AHP - NSC),它们使用两种不同的惩罚函数进行微阵列分类,比NSC有所改进。与LASSO中使用的L(1)范数惩罚不同,我们考虑的惩罚项利用了属于一个基因的参数应被视为一个自然组这一事实。数值结果表明,这两种新方法比L(1)范数方法更倾向于有效地去除无关基因,并提供更好的分类结果。
可根据作者要求提供ALP - NSC和AHP - NSC算法的R代码。