Dukes Oliver, Avagyan Vahe, Vansteelandt Stijn
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
Mathematical and Statistical Methods Group, Wageningen University and Research, Wageningen, The Netherlands.
Biometrics. 2020 Dec;76(4):1190-1200. doi: 10.1111/biom.13231. Epub 2020 Feb 28.
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.
在变量选择之后,回归参数的标准推断程序可能并非一致有效;不存在一个有限样本量能保证标准检验近似达到其名义显著性水平。在高维情形下,这个问题会更加严重,因为变量选择变得不可避免。这促使人们在高维模型中为低维回归参数(例如暴露因素A对结局Y的因果效应)开发一致有效的假设检验方面展开了一系列活动。到目前为止,尽管在高维情形下模型误设不可避免,但人们对此关注有限。我们提出了在比文献中通常引用的稀疏条件更弱的条件下一致有效的原假设检验,假设暴露因素和结局的工作模型都被正确设定。当其中一个模型被误设时,通过修正估计干扰参数的程序,我们的检验仍然有效;因此,它们具有双重稳健性。我们的提议使用现有的惩罚最大似然估计软件很容易实现,并且不需要样本分割。我们在模拟以及对从根特大学重症监护病房获得的数据的分析中对它们进行了说明。