McGovern Mark E, Bärnighausen Till, Marra Giampiero, Radice Rosalba
From the aHarvard Center for Population and Development Studies, Cambridge, MA; bDepartment of Global Health and Population, Harvard School of Public Health, Boston, MA; cWellcome Trust Africa Centre for Health and Population Studies, University of KwaZulu-Natal, Mtubatuba, South Africa; dDepartment of Statistical Science, University College London, London, UK; and eDepartment of Economics, Mathematics and Statistics, Birkbeck, University of London, London, UK.
Epidemiology. 2015 Mar;26(2):229-37. doi: 10.1097/EDE.0000000000000218.
Heckman-type selection models have been used to control HIV prevalence estimates for selection bias when participation in HIV testing and HIV status are associated after controlling for observed variables. These models typically rely on the strong assumption that the error terms in the participation and the outcome equations that comprise the model are distributed as bivariate normal.
We introduce a novel approach for relaxing the bivariate normality assumption in selection models using copula functions. We apply this method to estimating HIV prevalence and new confidence intervals (CI) in the 2007 Zambia Demographic and Health Survey (DHS) by using interviewer identity as the selection variable that predicts participation (consent to test) but not the outcome (HIV status).
We show in a simulation study that selection models can generate biased results when the bivariate normality assumption is violated. In the 2007 Zambia DHS, HIV prevalence estimates are similar irrespective of the structure of the association assumed between participation and outcome. For men, we estimate a population HIV prevalence of 21% (95% CI = 16%-25%) compared with 12% (11%-13%) among those who consented to be tested; for women, the corresponding figures are 19% (13%-24%) and 16% (15%-17%).
Copula approaches to Heckman-type selection models are a useful addition to the methodological toolkit of HIV epidemiology and of epidemiology in general. We develop the use of this approach to systematically evaluate the robustness of HIV prevalence estimates based on selection models, both empirically and in a simulation study.
当在控制观察变量后,参与艾滋病毒检测与艾滋病毒感染状况相关联时,Heckman型选择模型已被用于控制艾滋病毒流行率估计中的选择偏倚。这些模型通常依赖于一个强假设,即构成模型的参与方程和结果方程中的误差项服从二元正态分布。
我们引入了一种使用Copula函数放宽选择模型中二元正态假设的新方法。我们将此方法应用于估计2007年赞比亚人口与健康调查(DHS)中的艾滋病毒流行率和新的置信区间(CI),使用访员身份作为预测参与(同意检测)但不预测结果(艾滋病毒感染状况)的选择变量。
我们在一项模拟研究中表明,当二元正态假设被违反时,选择模型可能会产生有偏差的结果。在2007年赞比亚人口与健康调查中,无论假设的参与和结果之间的关联结构如何,艾滋病毒流行率估计值都相似。对于男性,我们估计总体艾滋病毒流行率为21%(95%CI = 16%-25%),而同意接受检测的男性中这一比例为12%(11%-13%);对于女性,相应的数字分别为19%(13%-24%)和16%(15%-17%)。
Copula方法用于Heckman型选择模型是艾滋病毒流行病学以及一般流行病学方法工具包中的一项有用补充。我们开发了这种方法的应用,以基于选择模型,通过实证研究和模拟研究系统地评估艾滋病毒流行率估计的稳健性。