Mitchell Alex J, Coyne James C
Department of Liaison Psychology, Leicester General Hospital, Leicester.
Br J Gen Pract. 2007 Feb;57(535):144-51.
Guidance from the National Institute for Health and Clinical Excellence recommends one or two questions as a possible screening method for depression. Ultra-short (one-, two-, three- or four-item) tests have appeal due to their simple administration but their accuracy has not been established.
To determine whether ultra-short screening instruments accurately detect depression in primary care.
Pooled analysis and meta analysis.
A literature search revealed 75 possible studies and from these, 22 STARD-compliant studies (Standards for Reporting of Diagnostic Accuracy) involving ultra-short tests were entered in the analysis.
Meta-analysis revealed a performance accuracy better than chance (P<0.001). More usefully for clinicians, pooled analysis of single-question tests revealed an overall sensitivity of 32.0% and specificity of 97.0% (positive predictive value [PPV] was 55.6% and negative predictive value [NPV] was 92.3%). For two- and three-item tests, overall sensitivity on pooled analysis was 73.7% and specificity was 74.7% with a PPV of only 38.3% but a pooled NPV of 93.0%. The Youden index for single-item and multiple item tests was 0.289 and 0.47 respectively, suggesting superiority of multiple item tests. Re-analysis examining only 'either or' strategies improved the 'rule in' ability of two- and three-question tests (sensitivity 79.4% and NPV 94.7%) but at the expense of being able to rule out a possible diagnosis if the result was negative.
A one-question test identifies only three out of every 10 patients with depression in primary care, thus unacceptable if relied on alone. Ultra-short two- or three-question tests perform better, identifying eight out of 10 cases. This is at the expense of a high false-positive rate (only four out of 10 cases with a positive score are actually depressed). Ultra-short tests appear to be, at best, a method for ruling out a diagnosis and should only be used when there are sufficient resources for second-stage assessment of those who screen positive.
英国国家卫生与临床优化研究所的指南推荐使用一两个问题作为抑郁症的一种可能筛查方法。超短(一、二、三或四项)测试因其实施简单而具有吸引力,但其准确性尚未得到证实。
确定超短筛查工具能否准确检测初级保健中的抑郁症。
汇总分析和荟萃分析。
文献检索发现75项可能的研究,从中选取22项符合STARD标准(诊断准确性报告标准)且涉及超短测试的研究纳入分析。
荟萃分析显示其表现准确性高于随机水平(P<0.001)。对临床医生更有用的是,单项测试的汇总分析显示总体敏感性为32.0%,特异性为97.0%(阳性预测值[PPV]为55.6%,阴性预测值[NPV]为92.3%)。对于两项和三项测试,汇总分析的总体敏感性为73.7%,特异性为74.7%,PPV仅为38.3%,但汇总NPV为93.0%。单项和多项测试的约登指数分别为0.289和0.47,表明多项测试更具优势。仅检查“二选一”策略的重新分析提高了两项和三项测试的“纳入规则”能力(敏感性79.4%,NPV 94.7%),但代价是如果结果为阴性则无法排除可能的诊断。
单项测试在初级保健中每10例抑郁症患者中只能识别出3例,因此仅依靠它是不可接受的。超短的两项或三项测试表现更好,能识别出10例中的8例。但代价是假阳性率较高(阳性评分的10例中只有4例实际患有抑郁症)。超短测试充其量似乎只是一种排除诊断的方法,仅应在有足够资源对筛查呈阳性者进行第二阶段评估时使用。