Suppr超能文献

使用调查数据估计艾滋病毒流行率时对无应答情况的多重填补法

Multiple imputation for non-response when estimating HIV prevalence using survey data.

作者信息

Chinomona Amos, Mwambi Henry

机构信息

Department of Statistics, Rhodes University, Grahamstown, South Africa.

School of Mathematics, Statistics and Computer Science, University of Kwa-Zulu Natal, Pietermaritzburg, South Africa.

出版信息

BMC Public Health. 2015 Oct 16;15:1059. doi: 10.1186/s12889-015-2390-1.

Abstract

BACKGROUND

Missing data are a common feature in many areas of research especially those involving survey data in biological, health and social sciences research. Most of the analyses of the survey data are done taking a complete-case approach, that is taking a list-wise deletion of all cases with missing values assuming that missing values are missing completely at random (MCAR). Methods that are based on substituting the missing values with single values such as the last value carried forward, the mean and regression predictions (single imputations) are also used. These methods often result in potential bias in estimates, in loss of statistical information and in loss of distributional relationships between variables. In addition, the strong MCAR assumption is not tenable in most practical instances.

METHODS

Since missing data are a major problem in HIV research, the current research seeks to illustrate and highlight the strength of multiple imputation procedure, as a method of handling missing data, which comes from its ability to draw multiple values for the missing observations from plausible predictive distributions for them. This is particularly important in HIV research in sub-Saharan Africa where accurate collection of (complete) data is still a challenge. Furthermore the multiple imputation accounts for the uncertainty introduced by the very process of imputing values for the missing observations. In particular national and subgroup estimates of HIV prevalence in Zimbabwe were computed using multiply imputed data sets from the 2010-11 Zimbabwe Demographic and Health Surveys (2010-11 ZDHS) data. A survey logistic regression model for HIV prevalence and demographic and socio-economic variables was used as the substantive analysis model. The results for both the complete-case analysis and the multiple imputation analysis are presented and discussed.

RESULTS

Across different subgroups of the population, the crude estimates of HIV prevalence are generally not identical but their variations are consistent between the two approaches (complete-case analysis and multiple imputation analysis). The estimates of standard errors under the multiple imputation are predominantly smaller, hence leading to narrower confidence intervals, than under the complete case analysis. Under the logistic regression adjusted odds ratios vary greatly between the two approaches. The model based confidence intervals for the adjusted odds ratios are wider under the multiple imputation which is indicative of the inclusion of a combined measure of the within and between imputation variability.

CONCLUSIONS

There is considerable variation between estimates obtained between the two approaches. The use of multiple imputations allows the uncertainty brought about by the imputation process to be measured. This consequently yields more reliable estimates of the parameters of interest and reduce the chances of declaring significant effects unnecessarily (type I error). In addition, the utilization of the powerful and flexible statistical computing packages in R enhances the computations.

摘要

背景

缺失数据在许多研究领域中都很常见,尤其是在生物学、健康和社会科学研究中涉及调查数据的那些领域。大多数调查数据分析采用的是完全病例法,即对所有具有缺失值的病例进行逐一删除,假定缺失值是完全随机缺失的(MCAR)。也会使用一些基于用单个值替代缺失值的方法,例如向前结转的最后一个值、均值和回归预测值(单一填补法)。这些方法常常会导致估计值出现潜在偏差、统计信息丢失以及变量之间分布关系的丧失。此外,在大多数实际情况下,强MCAR假设是站不住脚的。

方法

由于缺失数据是艾滋病研究中的一个主要问题,当前的研究旨在阐明并突出多重填补程序作为一种处理缺失数据的方法的优势,这种优势源于它能够从缺失观测值的合理预测分布中为其抽取多个值。这在撒哈拉以南非洲的艾滋病研究中尤为重要,因为准确收集(完整)数据仍然是一项挑战。此外,多重填补考虑到了为缺失观测值填补值这一过程所引入的不确定性。特别是,利用2010 - 11年津巴布韦人口与健康调查(2010 - 11 ZDHS)数据的多重填补数据集计算了津巴布韦全国及各亚组的艾滋病流行率估计值。将一个针对艾滋病流行率以及人口统计学和社会经济变量的调查逻辑回归模型用作实质性分析模型。展示并讨论了完全病例分析和多重填补分析的结果。

结果

在不同的人群亚组中,艾滋病流行率的粗略估计值通常并不相同,但两种方法(完全病例分析和多重填补分析)之间的差异是一致的。多重填补下的标准误差估计值主要比完全病例分析下的小,因此置信区间更窄。在逻辑回归中,两种方法调整后的比值比差异很大。多重填补下基于模型的调整后比值比的置信区间更宽,这表明纳入了填补内和填补间变异性的综合度量。

结论

两种方法得到的估计值之间存在相当大的差异。使用多重填补能够衡量填补过程带来的不确定性。这从而产生更可靠的感兴趣参数估计值,并减少不必要地宣称显著效应(I型错误)的可能性。此外,在R语言中使用强大且灵活的统计计算软件包增强了计算能力。

相似文献

4
Review: a gentle introduction to imputation of missing values.综述:缺失值插补的简要介绍
J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11.
6
Multiple imputation: dealing with missing data.多重插补:处理缺失数据。
Nephrol Dial Transplant. 2013 Oct;28(10):2415-20. doi: 10.1093/ndt/gft221. Epub 2013 May 31.

本文引用的文献

2
Strategies for multiple imputation in longitudinal studies.纵向研究的多重插补策略。
Am J Epidemiol. 2010 Aug 15;172(4):478-87. doi: 10.1093/aje/kwq137. Epub 2010 Jul 8.
3
An introduction to modern missing data analyses.现代缺失数据分析简介。
J Sch Psychol. 2010 Feb;48(1):5-37. doi: 10.1016/j.jsp.2009.10.001.
8
Multiple imputation: a primer.多重填补:入门指南。
Stat Methods Med Res. 1999 Mar;8(1):3-15. doi: 10.1177/096228029900800102.
9
Handling missing data in survey research.调查研究中的缺失数据处理
Stat Methods Med Res. 1996 Sep;5(3):215-38. doi: 10.1177/096228029600500302.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验