From the Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC.
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC.
Epidemiology. 2023 Mar 1;34(2):192-200. doi: 10.1097/EDE.0000000000001572. Epub 2022 Dec 29.
When accounting for misclassification, investigators make assumptions about whether misclassification is "differential" or "nondifferential." Most guidance on differential misclassification considers settings where outcome misclassification varies across levels of exposure, or vice versa. Here, we examine when covariate-differential misclassification must be considered when estimating overall outcome prevalence.
We generated datasets with outcome misclassification under five data generating mechanisms. In each, we estimated prevalence using estimators that (a) ignored misclassification, (b) assumed misclassification was nondifferential, and (c) allowed misclassification to vary across levels of a covariate. We compared bias and precision in estimated prevalence in the study sample and an external target population using different sources of validation data to account for misclassification. We illustrated use of each approach to estimate HIV prevalence using self-reported HIV status among people in East Africa cross-border areas.
The estimator that allowed misclassification to vary across levels of the covariate produced results with little bias for both populations in all scenarios but had higher variability when the validation study contained sparse strata. Estimators that assumed nondifferential misclassification produced results with little bias when the covariate distribution in the validation data matched the covariate distribution in the target population; otherwise estimates assuming nondifferential misclassification were biased.
If validation data are a simple random sample from the target population, assuming nondifferential outcome misclassification will yield prevalence estimates with little bias regardless of whether misclassification varies across covariates. Otherwise, obtaining valid prevalence estimates requires incorporating covariates into the estimators used to account for misclassification.
在考虑错误分类时,研究人员会对错误分类是“差异的”还是“非差异的”做出假设。关于差异错误分类的大多数指导意见都考虑了这样的情况,即结果错误分类在暴露水平之间变化,或者反之亦然。在这里,我们研究了在估计总体结果流行率时何时必须考虑协变量差异错误分类。
我们通过五种数据生成机制生成了存在结果错误分类的数据集。在每种情况下,我们使用以下估计器来估计流行率:(a)忽略错误分类,(b)假设错误分类是非差异的,以及(c)允许错误分类在协变量的各个水平上变化。我们使用不同来源的验证数据来考虑错误分类,在研究样本和外部目标人群中比较了在估计流行率时的偏差和精度。我们使用报告的东非跨境地区的艾滋病毒感染者的艾滋病毒自我报告状态,说明了每种方法在估计艾滋病毒流行率方面的用途。
在所有情况下,允许错误分类在协变量的各个水平上变化的估计器对两个人群都产生了几乎没有偏差的结果,但在验证研究包含稀疏层时,其变异性更高。当验证数据中的协变量分布与目标人群中的协变量分布匹配时,假设非差异错误分类的估计器会产生几乎没有偏差的结果;否则,假设非差异错误分类的估计结果会存在偏差。
如果验证数据是目标人群的简单随机样本,则假设结果错误分类是非差异的,无论错误分类是否在协变量之间变化,都会产生偏差较小的流行率估计值。否则,要获得有效的流行率估计值,需要将协变量纳入用于错误分类的估计器中。