Department of Global Health and Population, Harvard School of Public Health, Boston, MA, USA.
Epidemiology. 2011 Jan;22(1):27-35. doi: 10.1097/EDE.0b013e3181ffa201.
HIV prevalence estimates from population-based surveys are vulnerable to selection bias if HIV status is missing for a proportion of the eligible population. Standard approaches, such as imputation, to correct prevalence estimates for selective nonparticipation assume that data are "missing at random." These approaches lead to biased estimates, if unobserved factors are associated with both survey participation and HIV status.
We use Heckman-type selection models to test and correct for selection on unobserved factors (separately for men and women) in the 2007 Zambia Demographic and Health Survey, in which 28% of the 7146 eligible men and 23% of the 7408 eligible women did not participate in HIV testing. Performance of these models depends crucially on selection variables that determine survey participation but do not independently affect HIV status.
We identify 2 highly-plausible selection variables that are statistically significant determinants of survey participation: interviewer identity, and visit on the first day of fieldwork in a survey cluster. HIV-positive status was negatively correlated with consent to test in men (ρ = -0.75 [95% confidence interval = -0.94 to -0.18]), but not in women. Adjusting for selection on unobserved variables substantially increased the HIV prevalence estimate for men from 12% (based on measured HIV status alone) and 12% (based on imputation) to 21%. In addition, the adjustment for selection substantially changed the estimated effects of HIV risk factors.
Studies of HIV prevalence and risk factors based on surveys with substantial nonparticipation should routinely use Heckman-type selection models to correct for selection on unobserved variables.
如果符合条件的人群中,有一定比例的人未报告其艾滋病毒状况,那么基于人群的调查得出的艾滋病毒流行率估计数容易受到选择偏倚的影响。标准的方法,如插补,来校正因选择性不参与而导致的流行率估计数,假设数据是“随机缺失”。如果未观察到的因素与调查参与和艾滋病毒状况都有关联,那么这些方法会导致有偏差的估计数。
我们使用 Heckman 型选择模型,分别对 2007 年赞比亚人口与健康调查中男性(28%的 7146 名合格男性和女性(23%的 7408 名合格女性)未参与艾滋病毒检测的未观察到的因素(分别针对男性和女性)进行检测和校正。这些模型的性能取决于确定调查参与但不独立影响艾滋病毒状况的选择变量。
我们确定了 2 个高度合理的选择变量,这些变量是调查参与的统计显著决定因素:访谈者身份和调查集群中实地工作的第一天的访问。艾滋病毒阳性状态与男性的同意检测呈负相关(ρ=-0.75[95%置信区间=-0.94 至-0.18]),但与女性无关。对未观察到的变量进行选择校正后,男性的艾滋病毒流行率估计值从仅根据测量的艾滋病毒状况得出的 12%(12%)大幅增加到 21%。此外,选择校正还大大改变了艾滋病毒危险因素的估计效果。
对于参与度较大的调查,基于调查的艾滋病毒流行率和危险因素研究应常规使用 Heckman 型选择模型,对未观察到的变量进行选择校正。