Tong Tiejun, Feng Zeny, Hilton Julia S, Zhao Hongyu
Department of Mathematics, Hong Kong Baptist University, Hong Kong ; Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong.
J Appl Stat. 2013 Jan 1;40(9):1949-1964. doi: 10.1080/02664763.2013.800035.
Estimating the proportion of true null hypotheses, π, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π by incorporating the distribution pattern of the observed -values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null -values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
估计真实零假设的比例π在最近的统计文献中备受关注。除了其与一组特定科学假设的明显相关性外,准确估计该参数是许多多重检验程序的关键。文献中大多数现有的估计π的方法都是基于检验统计量的独立性假设,而这在现实中往往并不成立。模拟表明,在检验统计量存在相关性的情况下,大多数现有的估计量可能表现不佳,主要原因是这些估计量的方差增加。在本文中,我们提出了几种数据驱动的方法来估计π,通过纳入观测值的分布模式,作为解决检验统计量之间潜在相关性的一种实用方法。具体来说,我们使用线性拟合来给出[0, 1]整个范围内(λ, 1]中真零值比例的数据驱动估计,而不是使用1 - λ处的期望比例。我们发现,所提出的估计量可能会大幅降低估计的真实零比例的方差,从而提高整体性能。