Guangzhou Institute of International Finance, Guangzhou University, Guangzhou, Guangdong 510006, China.
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States.
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae088.
The geometric median, which is applicable to high-dimensional data, can be viewed as a generalization of the univariate median used in 1-dimensional data. It can be used as a robust estimator for identifying the location of multi-dimensional data and has a wide range of applications in real-world scenarios. This paper explores the problem of high-dimensional multivariate analysis of variance (MANOVA) using the geometric median. A maximum-type statistic that relies on the differences between the geometric medians among various groups is introduced. The distribution of the new test statistic is derived under the null hypothesis using Gaussian approximations, and its consistency under the alternative hypothesis is established. To approximate the distribution of the new statistic in high dimensions, a wild bootstrap algorithm is proposed and theoretically justified. Through simulation studies conducted across a variety of dimensions, sample sizes, and data-generating models, we demonstrate the finite-sample performance of our geometric median-based MANOVA method. Additionally, we implement the proposed approach to analyze a breast cancer gene expression dataset.
几何中位数适用于高维数据,可以看作是用于一维数据的单变量中位数的推广。它可用作识别多维数据位置的稳健估计量,在实际场景中有广泛的应用。本文探讨了使用几何中位数进行高维多元方差分析(MANOVA)的问题。引入了一种基于各组之间几何中位数差异的最大型统计量。在零假设下,使用高斯逼近法推导出新检验统计量的分布,并在备择假设下证明了其一致性。为了在高维情况下近似新统计量的分布,提出并从理论上证明了一种野点 bootstrap 算法。通过在各种维度、样本量和数据生成模型上进行的模拟研究,我们展示了基于几何中位数的 MANOVA 方法的有限样本性能。此外,我们还实施了所提出的方法来分析乳腺癌基因表达数据集。