Pan Wenliang, Tian Yuan, Wang Xueqin, Zhang Heping
Sun Yat-sen University.
Yale University.
Ann Stat. 2018 Jun;46(3):1109-1137. doi: 10.1214/17-AOS1579.
In this paper, we first introduce Ball Divergence, a novel measure of the difference between two probability measures in separable Banach spaces, and show that the Ball Divergence of two probability measures is zero if and only if these two probability measures are identical without any moment assumption. Using Ball Divergence, we present a metric rank test procedure to detect the equality of distribution measures underlying independent samples. It is therefore robust to outliers or heavy-tail data. We show that this multivariate two sample test statistic is consistent with the Ball Divergence, and it converges to a mixture of χ distributions under the null hypothesis and a normal distribution under the alternative hypothesis. Importantly, we prove its consistency against a general alternative hypothesis. Moreover, this result does not depend on the ratio of the two imbalanced sample sizes, ensuring that can be applied to imbalanced data. Numerical studies confirm that our test is superior to several existing tests in terms of Type I error and power. We conclude our paper with two applications of our method: one is for virtual screening in drug development process and the other is for genome wide expression analysis in hormone replacement therapy.
在本文中,我们首先引入球散度,这是一种用于衡量可分巴拿赫空间中两个概率测度差异的新方法,并表明在没有任何矩假设的情况下,当且仅当这两个概率测度相同时,它们的球散度为零。利用球散度,我们提出了一种度量秩检验程序,用于检测独立样本背后分布测度的相等性。因此,它对异常值或重尾数据具有鲁棒性。我们表明,这个多变量两样本检验统计量与球散度一致,并且在原假设下它收敛到χ分布的混合,在备择假设下收敛到正态分布。重要的是,我们证明了它针对一般备择假设的一致性。此外,该结果不依赖于两个不平衡样本量的比例,确保其可应用于不平衡数据。数值研究证实,我们的检验在一类错误和检验功效方面优于几种现有检验。我们在论文结尾给出了该方法的两个应用:一个用于药物开发过程中的虚拟筛选,另一个用于激素替代疗法中的全基因组表达分析。