Pang Herbert, Tong Tiejun, Zhao Hongyu
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina 27705, USA.
Biometrics. 2009 Dec;65(4):1021-9. doi: 10.1111/j.1541-0420.2009.01200.x.
High-dimensional data such as microarrays have brought us new statistical challenges. For example, using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Diagonal discriminant analysis, support vector machines, and k-nearest neighbor have been suggested as among the best methods for small sample size situations, but none was found to be superior to others. In this article, we propose an improved diagonal discriminant approach through shrinkage and regularization of the variances. The performance of our new approach along with the existing methods is studied through simulations and applications to real data. These studies show that the proposed shrinkage-based and regularization diagonal discriminant methods have lower misclassification rates than existing methods in many cases.
诸如微阵列之类的高维数据给我们带来了新的统计挑战。例如,基于少量微阵列使用大量基因对样本进行分类仍然是一个难题。对角判别分析、支持向量机和k近邻法被认为是小样本量情况下的最佳方法,但没有一种方法被发现优于其他方法。在本文中,我们通过对方差进行收缩和正则化提出了一种改进的对角判别方法。我们通过模拟和对实际数据的应用来研究我们新方法与现有方法的性能。这些研究表明,在许多情况下,所提出的基于收缩和正则化的对角判别方法比现有方法具有更低的误分类率。