Gilbody Simon, Richards David, Brealey Stephen, Hewitt Catherine
Department of Health Sciences, University of York, York, YO10 5DD, UK.
J Gen Intern Med. 2007 Nov;22(11):1596-602. doi: 10.1007/s11606-007-0333-y. Epub 2007 Sep 14.
To summarize the psychometric properties of the PHQ2 and PHQ9 as screening instruments for depression.
We identified 17 validation studies conducted in primary care; medical outpatients; and specialist medical services (cardiology, gynecology, stroke, dermatology, head injury, and otolaryngology). Electronic databases from 1994 to February 2007 (MEDLINE, PsycLIT, EMBASE, CINAHL, Cochrane registers) plus study reference lists have been used for this study. Translations included US English, Dutch, Italian, Spanish, German and Arabic). Summary sensitivity, specificity, likelihood and diagnostic odds ratios (OR) against a gold standard (DSM-IV) Major Depressive Disorder (MDD) were calculated for each study. We used random effects bivariate meta-analysis at recommended cut points to produce summary receiver-operator characteristic (sROC) curves. We explored heterogeneity with metaregression.
Fourteen studies (5,026 participants) validated the PHQ9 against MDD: sensitivity = 0.80 (95% CI 0.71-0.87); specificity = 0.92 (95% CI 0.88-0.95); positive likelihood ratio = 10.12 (95% CI 6.52-15.67); negative likelihood ratio = 0.22 (0.15 to 0.32). There was substantial heterogeneity (Diagnostic Odds Ratio heterogeneity I2 = 82%), which was not explained by study setting (primary care versus general hospital); method of scoring (cutoff > or = 10 versus "diagnostic algorithm"); or study quality (blinded versus unblinded). The diagnostic validity of the PHQ2 was only validated in 3 studies and showed wide variability in sensitivity.
The PHQ9 is acceptable, and as good as longer clinician-administered instruments in a range of settings, countries, and populations. More research is needed to validate the PHQ2 to see if its diagnostic properties approach those of the PHQ9.
总结患者健康问卷-2(PHQ2)和患者健康问卷-9(PHQ9)作为抑郁症筛查工具的心理测量特性。
我们确定了17项在初级保健、门诊患者以及专科医疗服务(心脏病学、妇科、中风、皮肤科、头部损伤和耳鼻喉科)中进行的验证研究。本研究使用了1994年至2007年2月的电子数据库(MEDLINE、PsycLIT、EMBASE、CINAHL、Cochrane注册库)以及研究参考文献列表。翻译版本包括美国英语、荷兰语、意大利语、西班牙语、德语和阿拉伯语)。针对每项研究,计算了相对于金标准(《精神疾病诊断与统计手册》第四版,DSM-IV)重度抑郁症(MDD)的汇总敏感性、特异性、似然比和诊断比值比(OR)。我们在推荐的切点处使用随机效应双变量荟萃分析来生成汇总受试者工作特征(sROC)曲线。我们通过元回归探索异质性。
14项研究(5026名参与者)针对MDD对PHQ9进行了验证:敏感性 = 0.80(95%置信区间0.71 - 0.87);特异性 = 0.92(95%置信区间0.88 - 0.95);阳性似然比 = 10.12(95%置信区间6.52 - 15.67);阴性似然比 = 0.22(0.15至0.32)。存在显著的异质性(诊断比值比异质性I2 = 82%),研究背景(初级保健与综合医院)、评分方法(切点≥10与“诊断算法”)或研究质量(盲法与非盲法)均无法解释这种异质性。PHQ2的诊断有效性仅在3项研究中得到验证,且敏感性显示出很大的变异性。
PHQ9是可接受的,在一系列环境、国家和人群中与更长的临床医生使用的工具效果相当。需要更多研究来验证PHQ2,以确定其诊断特性是否接近PHQ9。