Tan Ji-Ping, Li Nan, Lan Xiao-Yang, Zhang Shi-Ming, Cui Bo, Liu Li-Xin, He Xin, Zeng Lin, Tau Li-Yuan, Zhang Hua, Wang Xiao-Xiao, Wang Lu-Ning, Zhao Yi-Ming
Department of Geriatric Neurology, Chinese PLA General Hospital, Beijing, PR China.
Research Center of Clinical Epidemiology, Peking University Third Hospital, Beijing, PR China.
Arch Gerontol Geriatr. 2017 Nov;73:43-49. doi: 10.1016/j.archger.2017.07.009. Epub 2017 Jul 20.
Although several statistical methods for adjusting for missing data have been developed and are widely applied in research, few studies have investigated these methods in adjusting for missingness in datasets that aim to estimate the prevalence of dementia. We attempted to develop a more feasible approach for handling missingness in a cross-sectional study among elderly.
Five methods of estimating prevalence, including stratified weighting (SW), inverse-probability weighting (IPW), hot deck imputation (HDI), ordinal logistic regression (OLR) and multiple imputation (MI), were applied to handle the missing data yielded by a dataset that include 2231 non-responders.
Compared with the results of the complete case analysis, the differences in the prevalence rates of dementia and mild cognitive impairment (MCI) calculated by the prevalence-estimating methods after adjusting for non-responders were less than 7% and 6%, respectively. In contrast to the results of other methods, the estimated prevalence of dementia and MCI calculated by MI increased when more predictive factors were included, and the lowest rate of missing data was achieved using MI. Using the participants' ages, the cognitive screening sores and activity of daily life sores as predictive variables when correcting for missingness induced relatively larger effects on the estimated dementia prevalence.
When adjusting for missingness while estimating the prevalence of dementia in cross-sectional studies, a simple method, such as SW, is recommended when limited information is available, whereas MI is the preferred method when additional information is available. Further simulation studies are needed to determine the optimal approach.
尽管已经开发了几种用于调整缺失数据的统计方法并在研究中广泛应用,但很少有研究在旨在估计痴呆症患病率的数据集中研究这些方法对缺失值的调整情况。我们试图开发一种更可行的方法来处理老年人横断面研究中的缺失值。
应用五种估计患病率的方法,包括分层加权(SW)、逆概率加权(IPW)、热卡插补(HDI)、有序逻辑回归(OLR)和多重插补(MI),来处理由包含2231名无应答者的数据集产生的缺失数据。
与完全病例分析的结果相比,在对无应答者进行调整后,通过患病率估计方法计算出的痴呆症和轻度认知障碍(MCI)患病率差异分别小于7%和6%。与其他方法的结果相反,当纳入更多预测因素时,通过MI计算出的痴呆症和MCI估计患病率增加,并且使用MI实现了最低的缺失数据率。在校正缺失值时,将参与者的年龄、认知筛查分数和日常生活分数作为预测变量对估计的痴呆症患病率产生相对较大的影响。
在横断面研究中估计痴呆症患病率时调整缺失值时,当可用信息有限时,建议使用简单方法,如SW,而当有额外信息时,MI是首选方法。需要进一步的模拟研究来确定最佳方法。