Division of Biostatistics and Bioinformatics, Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania.
Biostatistics and Bioinformatics Branch, National Institute of Child Health and Human Development, NIH, Bethesda, Maryland.
Biometrics. 2021 Jun;77(2):519-532. doi: 10.1111/biom.13330. Epub 2020 Jul 25.
Longitudinal data are very popular in practice, but they are often missing in either outcomes or time-dependent risk factors, making them highly unbalanced and complex. Missing data may contain various missing patterns or mechanisms, and how to properly handle it for unbiased and valid inference still presents a significant challenge. Here, we propose a novel semiparametric framework for analyzing longitudinal data with both missing responses and covariates that are missing at random and intermittent, a general and widely encountered situation in observational studies. Within this framework, we consider multiple robust estimation procedures based on innovative calibrated propensity scores, which offers additional relaxation of the misspecification of missing data mechanisms and shows more satisfactory numerical performance. Also, the corresponding robust information criterion on consistent variable selection for our proposed model is developed based on empirical likelihood-based methods. These advocated methods are evaluated in both theory and extensive simulation studies in a variety of situations, showing competing properties and advantages compared to the existing approaches. We illustrate the utility of our approach by analyzing the data from the HIV Epidemiology Research Study.
纵向数据在实践中非常受欢迎,但它们经常在结果或时变风险因素中缺失,这使得它们高度不平衡和复杂。缺失数据可能包含各种缺失模式或机制,如何正确处理以进行无偏和有效的推断仍然是一个重大挑战。在这里,我们提出了一种新的半参数框架,用于分析具有缺失响应和协变量的纵向数据,这些协变量是随机和间歇性缺失的,这是观察性研究中常见的一般情况。在这个框架内,我们考虑了基于创新校准倾向得分的多种稳健估计程序,这为缺失数据机制的错误指定提供了额外的放松,并显示出更令人满意的数值性能。此外,还基于经验似然方法为我们提出的模型开发了用于一致变量选择的相应稳健信息准则。这些被提倡的方法在各种情况下的理论和广泛的模拟研究中进行了评估,与现有方法相比,它们具有竞争性的特性和优势。我们通过分析 HIV 流行病学研究的数据来说明我们方法的实用性。