Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina.
Department of Biostatistics and Bioinformatics, The Biostatistics Center, Milken Institute School of Public Health, The George Washington University, Rockville, Maryland.
Stat Med. 2023 May 20;42(11):1641-1668. doi: 10.1002/sim.9692. Epub 2023 Mar 7.
Design-based analysis, which accounts for the design features of the study, is commonly used to conduct data analysis in studies with complex survey sampling, such as the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). In this type of longitudinal study, attrition has often been a problem. Although there have been various statistical approaches proposed to handle attrition, such as inverse probability weighting (IPW), non-response cell weighting (NRCW), multiple imputation (MI), and full information maximum likelihood (FIML) approach, there has not been a systematic assessment of these methods to compare their performance in design-based analyses. In this article, we perform extensive simulation studies and compare the performance of different missing data methods in linear and generalized linear population models, and under different missing data mechanism. We find that the design-based analysis is able to produce valid estimation and statistical inference when the missing data are handled appropriately using IPW, NRCW, MI, or FIML approach under missing-completely-at-random or missing-at-random missing mechanism and when the missingness model is correctly specified or over-specified. We also illustrate the use of these methods using data from HCHS/SOL.
基于设计的分析考虑了研究的设计特征,常用于进行复杂调查抽样研究中的数据分析,例如西班牙裔社区健康研究/拉丁裔研究(HCHS/SOL)。在这种纵向研究中,流失一直是一个问题。尽管已经提出了各种统计方法来处理流失,如逆概率加权(IPW)、非响应单元加权(NRCW)、多重插补(MI)和完全信息最大似然(FIML)方法,但尚未对这些方法进行系统评估,以比较它们在基于设计的分析中的性能。在本文中,我们进行了广泛的模拟研究,并比较了不同缺失数据方法在线性和广义线性总体模型中的性能,以及在不同缺失数据机制下的性能。我们发现,当使用 IPW、NRCW、MI 或 FIML 方法适当地处理缺失数据,并且在缺失完全随机或缺失随机缺失机制下以及在缺失模型正确指定或过度指定的情况下,基于设计的分析能够产生有效的估计和统计推断。我们还使用 HCHS/SOL 中的数据说明了这些方法的使用。