Morrison Jean, Simon Noah
Department of Human Gentetics, University of Chicago, Chicago, IL.
Department of Biostatistics, University of Washington, Seattle, WA.
J Comput Graph Stat. 2018;27(3):648-656. doi: 10.1080/10618600.2017.1411270. Epub 2018 Jun 14.
Confidence interval procedures used in low dimensional settings are often inappropriate for high dimensional applications. When many parameters are estimated, marginal confidence intervals associated with the most significant estimates have very low coverage rates: They are too small and centered at biased estimates. The problem of forming confidence intervals in high dimensional settings has previously been studied through the lens of selection adjustment. In that framework, the goal is to control the proportion of non-covering intervals formed for selected parameters. In this paper we approach the problem by considering the relationship between rank and coverage probability. Marginal confidence intervals have very low coverage rates for the most significant parameters and high rates for parameters with more boring estimates. Many selection adjusted intervals have the same behavior despite controlling the coverage rate within a selected set. This relationship between rank and coverage rate means that the parameters most likely to be pursued further in follow-up or replication studies are the least likely to be covered by the constructed intervals. In this paper, we propose rank conditional coverage (RCC) as a new coverage criterion for confidence intervals in multiple testing/covering problems. The RCC is the expected coverage rate of an interval given the significance ranking for the associated estimator. We also propose two methods that use bootstrapping to construct confidence intervals that control the RCC. Because these methods make use of additional information captured by the ranks of the parameter estimates, they often produce smaller intervals than marginal or selection adjusted methods. These methods are implemented in R (R Core Team, 2017) in the package rcc available on CRAN at https://cran.r-project.org/web/packages/rcc/index.html.
低维情形下使用的置信区间程序通常不适用于高维应用。当估计许多参数时,与最显著估计相关的边际置信区间的覆盖率非常低:它们太小且以有偏估计为中心。此前已通过选择调整的视角研究过高维情形下构建置信区间的问题。在该框架下,目标是控制为所选参数形成的未覆盖区间的比例。在本文中,我们通过考虑秩与覆盖概率之间的关系来处理这个问题。边际置信区间对于最显著的参数覆盖率非常低,而对于估计较不显著的参数覆盖率较高。许多经过选择调整的区间尽管在所选集合内控制了覆盖率,但仍有相同的表现。这种秩与覆盖率之间的关系意味着,在后续或重复研究中最有可能被进一步探究的参数,最不可能被构建的区间所覆盖。在本文中,我们提出秩条件覆盖(RCC)作为多重检验/覆盖问题中置信区间的一种新的覆盖标准。RCC是给定相关估计量的显著性排序时区间的期望覆盖率。我们还提出了两种使用自助法构建控制RCC的置信区间的方法。由于这些方法利用了参数估计秩所捕获的额外信息,它们通常会产生比边际或选择调整方法更小的区间。这些方法在R(R核心团队,2017)中通过CRAN上https://cran.r-project.org/web/packages/rcc/index.html的rcc包实现。