Qiao Xingye, Zhang Hao Helen, Liu Yufeng, Todd Michael J, Marron J S
Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599.
J Am Stat Assoc. 2010 Mar 1;105(489):401-414. doi: 10.1198/jasa.2010.tm08487.
While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the development of optimal weighting schemes for various nonstandard classification problems. In addition, we discuss several alternative criteria and propose an adaptive weighting scheme (awDWD) and demonstrate its advantages over nonadaptive weighting schemes under some situations. The second major contribution is a theoretical study of weighted DWD. Both high-dimensional low sample-size asymptotics and Fisher consistency of DWD are studied. The performance of weighted DWD is evaluated using simulated examples and two real data examples. The theoretical results are also confirmed by simulations.
虽然距离加权判别法(DWD)在高维分类中是一种很有吸引力的方法,但它是为平衡数据集设计的。在成本不平等、抽样有偏差或数据不平衡的情况下,可以使用适当加权的DWD版本(wDWD)进行重大改进。本文的一个主要贡献是为各种非标准分类问题开发了最优加权方案。此外,我们讨论了几种替代标准,并提出了一种自适应加权方案(awDWD),并在某些情况下证明了它相对于非自适应加权方案的优势。第二个主要贡献是对加权DWD的理论研究。研究了DWD的高维低样本量渐近性和Fisher一致性。使用模拟示例和两个实际数据示例评估了加权DWD的性能。模拟结果也证实了理论结果。