Rohan Kelly J, Rough Jennifer N, Evans Maggie, Ho Sheau-Yan, Meyerhoff Jonah, Roberts Lorinda M, Vacek Pamela M
Department of Psychological Science, University of Vermont, Burlington, VT, United States.
Department of Psychological Science, University of Vermont, Burlington, VT, United States.
J Affect Disord. 2016 Aug;200:111-8. doi: 10.1016/j.jad.2016.01.051. Epub 2016 Apr 20.
We present a fully articulated protocol for the Hamilton Rating Scale for Depression (HAM-D), including item scoring rules, rater training procedures, and a data management algorithm to increase accuracy of scores prior to outcome analyses. The latter involves identifying potentially inaccurate scores as interviews with discrepancies between two independent raters on the basis of either scores >=5-point difference) or meeting threshold for depression recurrence status, a long-term treatment outcome with public health significance. Discrepancies are resolved by assigning two new raters, identifying items with disagreement per an algorithm, and reaching consensus on the most accurate scores for those items.
These methods were applied in a clinical trial where the primary outcome was the Structured Interview Guide for the Hamilton Rating Scale for Depression-Seasonal Affective Disorder version (SIGH-SAD), which includes the 21-item HAM-D and 8 items assessing atypical symptoms. 177 seasonally depressed adult patients were enrolled and interviewed at 10 time points across treatment and the 2-year followup interval for a total of 1589 completed interviews with 1535 (96.6%) archived.
Inter-rater reliability ranged from ICCs of .923-.967. Only 86 (5.6%) interviews met criteria for a between-rater discrepancy. HAM-D items "Depressed Mood", "Work and Activities", "Middle Insomnia", and "Hypochondriasis" and Atypical items "Fatigability" and "Hypersomnia" contributed most to discrepancies.
Generalizability beyond well-trained, experienced raters in a clinical trial is unknown.
Researchers might want to consider adopting this protocol in part or full. Clinicians might want to tailor it to their needs.
我们提出了一种针对汉密尔顿抑郁量表(HAM-D)的完整阐述方案,包括项目评分规则、评分者培训程序以及一种数据管理算法,以在结果分析之前提高评分的准确性。后者涉及根据两个独立评分者之间的分数差异(分数差异 >=5 分)或达到抑郁复发状态阈值(这是具有公共卫生意义的长期治疗结果)来识别潜在不准确的分数。通过指定两名新的评分者、根据算法识别存在分歧的项目以及就这些项目的最准确分数达成共识来解决差异。
这些方法应用于一项临床试验,其中主要结局是汉密尔顿抑郁量表季节性情感障碍版结构化访谈指南(SIGH-SAD),该指南包括 21 项的 HAM-D 和 8 项评估非典型症状的项目。177 名季节性抑郁的成年患者在治疗期间和 2 年随访期的 10 个时间点接受了访谈,共完成 1589 次访谈,其中 1535 次(96.6%)存档。
评分者间信度的组内相关系数(ICC)范围为 0.923 - 0.967。只有 86 次(5.6%)访谈符合评分者间差异的标准。HAM-D 项目“抑郁情绪”“工作与活动”“中度失眠”和“疑病”以及非典型项目“易疲劳”和“嗜睡”对差异的影响最大。
在临床试验中,该方案在训练有素、经验丰富的评分者之外的可推广性尚不清楚。
研究人员可能会考虑部分或全部采用该方案。临床医生可能希望根据自身需求进行调整。