Gelaye Bizu, Tadesse Mahlet G, Williams Michelle A, Fann Jesse R, Vander Stoep Ann, Andrew Zhou Xiao-Hua
Department of Epidemiology, Harvard School of Public Health, Boston, MA; Department of Epidemiology, University of Washington School of Public Health, Seattle, WA.
Department of Mathematics and Statistics, Georgetown University, Washington, DC.
Ann Epidemiol. 2014 Jul;24(7):527-31. doi: 10.1016/j.annepidem.2014.04.009. Epub 2014 May 2.
We evaluated the extent to which use of a hypothesized imperfect gold standard, the Composite International Diagnostic Interview (CIDI), biases the estimates of diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9). We also evaluate how statistical correction can be used to address this bias.
The study was conducted among 926 adults where structured interviews were conducted to collect information about participants' current major depressive disorder using PHQ-9 and CIDI instruments. First, we evaluated the relative psychometric properties of PHQ-9 using CIDI as a gold standard. Next, we used a Bayesian latent class model to correct for the bias.
In comparison with CIDI, the relative sensitivity and specificity of the PHQ-9 for detecting major depressive disorder at a cut point of 10 or more were 53.1% (95% confidence interval: 45.4%-60.8%) and 77.5% (95% confidence interval, 74.5%-80.5%), respectively. Using a Bayesian latent class model to correct for the bias arising from the use of an imperfect gold standard increased the sensitivity and specificity of PHQ-9 to 79.8% (95% Bayesian credible interval, 64.9%-90.8%) and 79.1% (95% Bayesian credible interval, 74.7%-83.7%), respectively.
Our results provided evidence that assessing diagnostic validity of mental health screening instrument, where application of a gold standard might not be available, can be accomplished by using appropriate statistical methods.
我们评估了使用假定的不完善金标准——《综合国际诊断访谈》(CIDI)对患者健康问卷-9(PHQ-9)诊断准确性估计产生偏差的程度。我们还评估了如何使用统计校正来解决这种偏差。
该研究在926名成年人中进行,通过结构化访谈使用PHQ-9和CIDI工具收集有关参与者当前重度抑郁症的信息。首先,我们以CIDI作为金标准评估PHQ-9的相对心理测量特性。接下来,我们使用贝叶斯潜在类别模型来校正偏差。
与CIDI相比,PHQ-9在切点为10或更高时检测重度抑郁症的相对敏感性和特异性分别为53.1%(95%置信区间:45.4%-60.8%)和77.5%(95%置信区间,74.5%-80.5%)。使用贝叶斯潜在类别模型校正因使用不完善金标准而产生的偏差后,PHQ-9的敏感性和特异性分别提高到79.8%(95%贝叶斯可信区间,64.9%-90.8%)和79.1%(95%贝叶斯可信区间,74.7%-83.7%)。
我们的结果提供了证据,表明在可能无法应用金标准的情况下,通过使用适当的统计方法可以评估心理健康筛查工具的诊断有效性。