Department of Statistics, University of South Carolina, Columbia, SC 29208, United States of America.
Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae101.
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.
我们开发了一种在逻辑回归中进行变量选择后的有效推断方法,适用于响应部分观测的情况,即观测到一组易出错的测试结果,而不是响应的真实值。为了在考虑响应数据中缺失信息的同时选择重要的协变量,我们应用期望最大化算法在 LASSO 惩罚下计算最大似然估计量。在变量选择之后,我们通过扩展基于多面体引理的选择后推断方法对所选协变量效应进行推断。我们广泛的模拟研究的实证证据表明,与使用相同数据进行变量选择和推断而不进行变量选择调整的简单推断方法相比,我们的选择后推断结果更可靠。