Xia Lu, Nan Bin, Li Yi
Department of Biostatistics, University of Washington, Seattle, Washington, USA.
Department of Statistics, University of California, Irvine, Irvine, California, USA.
Scand Stat Theory Appl. 2023 Jun;50(2):550-571. doi: 10.1111/sjos.12595. Epub 2022 Apr 25.
For statistical inference on regression models with a diverging number of covariates, the existing literature typically makes sparsity assumptions on the inverse of the Fisher information matrix. Such assumptions, however, are often violated under Cox proportion hazards models, leading to biased estimates with under-coverage confidence intervals. We propose a modified debiased lasso method, which solves a series of quadratic programming problems to approximate the inverse information matrix without posing sparse matrix assumptions. We establish asymptotic results for the estimated regression coefficients when the dimension of covariates diverges with the sample size. As demonstrated by extensive simulations, our proposed method provides consistent estimates and confidence intervals with nominal coverage probabilities. The utility of the method is further demonstrated by assessing the effects of genetic markers on patients' overall survival with the Boston Lung Cancer Survival Cohort, a large-scale epidemiology study investigating mechanisms underlying the lung cancer.
对于具有数量不断增加的协变量的回归模型进行统计推断时,现有文献通常对费希尔信息矩阵的逆做出稀疏性假设。然而,在考克斯比例风险模型下,这些假设常常被违背,导致估计有偏差且置信区间的覆盖范围不足。我们提出了一种改进的去偏套索方法,该方法通过求解一系列二次规划问题来近似信息矩阵的逆,而无需提出稀疏矩阵假设。当协变量的维度随着样本量增加时,我们建立了估计回归系数的渐近结果。大量模拟表明,我们提出的方法能提供具有标称覆盖概率的一致估计和置信区间。通过使用波士顿肺癌生存队列(一项调查肺癌潜在机制的大规模流行病学研究)评估基因标记对患者总生存的影响,进一步证明了该方法的实用性。