Department of Applied Statistics, Social Science, and Humanities, New York University, United States; Center for the Promotion of Research at the Intersection of Information, Society, and Methodology, New York University, United States.
Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, United Kingdom.
Methods. 2022 Aug;204:300-311. doi: 10.1016/j.ymeth.2021.11.005. Epub 2021 Nov 12.
Shortened versions of self-reported questionnaires may be used to reduce respondent burden. When shortened screening tools are used, it is desirable to maintain equivalent diagnostic accuracy to full-length forms. This manuscript presents a case study that illustrates how external data and individual participant data meta-analysis can be used to assess the equivalence in diagnostic accuracy between a shortened and full-length form. This case study compares the Patient Health Questionnaire-9 (PHQ-9) and a 4-item shortened version (PHQ-Dep-4) that was previously developed using optimal test assembly methods. Using a large database of 75 primary studies (34,698 participants, 3,392 major depression cases), we evaluated whether the PHQ-Dep-4 cutoff of ≥ 4 maintained equivalent diagnostic accuracy to a PHQ-9 cutoff of ≥ 10. Using this external validation dataset, a PHQ-Dep-4 cutoff of ≥ 4 maximized the sum of sensitivity and specificity, with a sensitivity of 0.88 (95% CI 0.81, 0.93), 0.68 (95% CI 0.56, 0.78), and 0.80 (95% CI 0.73, 0.85) for the semi-structured, fully structured, and MINI reference standard categories, respectively, and a specificity of 0.79 (95% CI 0.74, 0.83), 0.85 (95% CI 0.78, 0.90), and 0.83 (95% CI 0.80, 0.86) for the semi-structured, fully structured, and MINI reference standard categories, respectively. While equivalence with a PHQ-9 cutoff of ≥ 10 was not established, we found the sensitivity of the PHQ-Dep-4 to be non-inferior to that of the PHQ-9, and the specificity of the PHQ-Dep-4 to be marginally smaller than the PHQ-9.
缩短版的自我报告问卷可以用来减轻被调查者的负担。当使用缩短的筛查工具时,理想情况下,其诊断准确性应与完整形式相当。本文介绍了一个案例研究,说明了如何使用外部数据和个体参与者数据荟萃分析来评估缩短形式和完整形式之间诊断准确性的等效性。本案例研究比较了患者健康问卷-9(PHQ-9)和之前使用最佳测试组合方法开发的 4 项简短版本(PHQ-Dep-4)。我们利用一个包含 75 项主要研究(34698 名参与者,3392 例重度抑郁症)的大型数据库,评估 PHQ-Dep-4 的截断值≥4 是否与 PHQ-9 的截断值≥10 具有等效的诊断准确性。在这个外部验证数据集上,PHQ-Dep-4 的截断值≥4 使灵敏度和特异性之和最大化,其在半结构化、完全结构化和 MINI 参考标准类别中的灵敏度分别为 0.88(95%置信区间 0.81,0.93)、0.68(95%置信区间 0.56,0.78)和 0.80(95%置信区间 0.73,0.85),特异性分别为 0.79(95%置信区间 0.74,0.83)、0.85(95%置信区间 0.78,0.90)和 0.83(95%置信区间 0.80,0.86)。虽然未确定 PHQ-Dep-4 与 PHQ-9 截断值≥10 的等效性,但我们发现 PHQ-Dep-4 的灵敏度不低于 PHQ-9,而 PHQ-Dep-4 的特异性略小于 PHQ-9。