Shan Junhao, Wang Lei
School of Statistics and Data Science, KLMDASR, LEBPS and LPMC, Nankai University, Tianjin, People's Republic of China.
J Appl Stat. 2024 Feb 11;51(14):2719-2743. doi: 10.1080/02664763.2024.2315467. eCollection 2024.
For high-dimensional generalized linear models (GLMs) with massive data, this paper investigates a unified optimal Poisson subsampling scheme to conduct estimation and inference for prespecified low-dimensional partition of the whole parameter. A Poisson subsampling decorrelated score function is proposed such that the adverse effect of the less accurate nuisance parameter estimation with slow convergence rate can be mitigated. The resultant Poisson subsample estimator is proved to enjoy consistency and asymptotic normality, and a more general optimal subsampling criterion including A- and L-optimality criteria is formulated to improve estimation efficiency. We also propose a two-step algorithm for implementation and discuss some practical issues. The satisfactory performance of our method is validated through simulation studies and a real dataset.
对于具有海量数据的高维广义线性模型(GLMs),本文研究了一种统一的最优泊松子采样方案,用于对整个参数的预先指定的低维划分进行估计和推断。提出了一种泊松子采样去相关得分函数,以减轻收敛速度较慢的不太准确的干扰参数估计的不利影响。结果表明,所得的泊松子样本估计器具有一致性和渐近正态性,并制定了一个更通用的最优子采样准则,包括A-最优和L-最优准则,以提高估计效率。我们还提出了一种两步算法用于实现,并讨论了一些实际问题。通过模拟研究和一个真实数据集验证了我们方法的良好性能。