抑郁评分中不可靠性的来源。

Sources of unreliability in depression ratings.

作者信息

Kobak Kenneth A, Brown Brianne, Sharp Ian, Levy-Mack Hollie, Wells Kurrie, Ockun Felice, Williams Janet B W

机构信息

MedAvante Research Institute, Madison, WI, USA.

出版信息

J Clin Psychopharmacol. 2009 Feb;29(1):82-5. doi: 10.1097/JCP.0b013e318192e4d7.

DOI:10.1097/JCP.0b013e318192e4d7

PMID:19142114

Abstract

BACKGROUND

Good interrater reliability is essential to minimize error variance and improve study power. Reasons why raters differ in scoring the same patient include information variance (different information obtained because of asking different questions), observation variance (the same information is obtained, but raters differ in what they notice and remember), interpretation variance (differences in the significance attached to what is observed), criterion variance (different criteria used to score items), and subject variance (true differences in the subject). We videotaped and transcribed 30 pairs of interviews to examine the most common sources of rater unreliability.

METHOD

Thirty patients who experienced depression were independently interviewed by 2 different raters on the same day. Raters provided rationales for their scoring, and independent assessors reviewed the rationales, the interview transcripts, and the videotapes to code the main reason for each discrepancy. One third of the interviews were conducted by raters who had not administered the Hamilton Depression Rating Scale before; one third, by raters who were experienced but not calibrated; and one third, by experienced and calibrated raters.

RESULTS

Experienced and calibrated raters had the highest interrater reliability (intraclass correlation [ICC]; r = 0.93) followed by inexperienced raters (r = 0.77) and experienced but uncalibrated raters (r = 0.55). The most common reason for disagreement was interpretation variance (39%), followed by information variance (30%), criterion variance (27%), and observation variance (4%). Experienced and calibrated raters had significantly less criterion variance than the other cohorts (P = 0.001).

CONCLUSIONS

Reasons for disagreement varied by level of experience and calibration. Experienced and uncalibrated raters should focus on establishing common conventions, whereas experienced and calibrated raters should focus on fine tuning judgment calls on different thresholds of symptoms. Calibration training seems to improve reliability over experience alone. Experienced raters without cohort calibration had lower reliability than inexperienced raters.

摘要

背景

良好的评分者间信度对于最小化误差方差和提高研究效能至关重要。评分者对同一患者评分存在差异的原因包括信息方差（因询问不同问题而获得不同信息）、观察方差（获得相同信息，但评分者在注意和记忆的内容上存在差异）、解释方差（对所观察内容赋予的意义不同）、标准方差（用于对项目评分的不同标准）以及个体方差（个体的真实差异）。我们对30对访谈进行了录像和转录，以检查评分者不可靠性最常见的来源。

方法

30名患有抑郁症的患者在同一天由2名不同的评分者独立进行访谈。评分者为其评分提供理由，独立评估者审查这些理由、访谈记录和录像带，以对每个差异的主要原因进行编码。三分之一的访谈由之前未使用汉密尔顿抑郁量表的评分者进行；三分之一由有经验但未校准的评分者进行；三分之一由有经验且经过校准的评分者进行。

结果

有经验且经过校准的评分者具有最高的评分者间信度（组内相关系数[ICC]；r = 0.93），其次是无经验的评分者（r = 0.77）和有经验但未校准的评分者（r = 0.55）。意见不一致的最常见原因是解释方差（39%），其次是信息方差（30%）、标准方差（27%）和观察方差（4%）。有经验且经过校准的评分者的标准方差显著低于其他组（P = 0.001）。

结论

意见不一致的原因因经验水平和校准情况而异。有经验但未校准的评分者应专注于建立共同的准则，而有经验且经过校准的评分者应专注于对不同症状阈值的判断进行微调。校准培训似乎比单纯的经验更能提高信度。未进行组内校准的有经验评分者的信度低于无经验评分者。

相似文献

Sources of unreliability in depression ratings.

J Clin Psychopharmacol. 2009 Feb;29(1):82-5. doi: 10.1097/JCP.0b013e318192e4d7.

The new GRID Hamilton Rating Scale for Depression demonstrates excellent inter-rater reliability for inexperienced and experienced raters before and after training.

Psychiatry Res. 2007 Sep 30;153(1):61-7. doi: 10.1016/j.psychres.2006.07.004. Epub 2007 Apr 18.

Inter-rater reliability of the Hamilton Depression Rating Scale as a diagnostic and outcome measure of depression in primary care.

J Affect Disord. 2008 Dec;111(2-3):204-13. doi: 10.1016/j.jad.2008.02.013. Epub 2008 Mar 28.

A comparison of face-to-face and remote assessment of inter-rater reliability on the Hamilton Depression Rating Scale via videoconferencing.

Psychiatry Res. 2008 Feb 28;158(1):99-103. doi: 10.1016/j.psychres.2007.06.025. Epub 2007 Oct 24.

Differences in inter-rater reliability and accuracy for a treatment adherence scale.

Cogn Behav Ther. 2007;36(4):230-9. doi: 10.1080/16506070701584367.

The Rater Applied Performance Scale: development and reliability.

Psychiatry Res. 2004 Jun 30;127(1-2):147-55. doi: 10.1016/j.psychres.2004.03.001.

Evaluating rater competency for CNS clinical trials.

J Clin Psychopharmacol. 2006 Jun;26(3):308-10. doi: 10.1097/01.jcp.0000219049.33008.b7.

Reliability of videotaped observational gait analysis in patients with orthopedic impairments.

BMC Musculoskelet Disord. 2005 Mar 17;6:17. doi: 10.1186/1471-2474-6-17.

A novel approach to rater training and certification in multinational trials.

Int Clin Psychopharmacol. 2007 Jul;22(4):187-91. doi: 10.1097/YIC.0b013e3280803dad.

The modified Rankin Scale in acute stroke has good inter-rater-reliability but questionable validity.

Cerebrovasc Dis. 2010 Jan;29(2):188-93. doi: 10.1159/000267278. Epub 2009 Dec 18.

引用本文的文献

The reproducibility of psychiatric evaluations of work disability: two reliability and agreement studies.

BMC Psychiatry. 2019 Jul 3;19(1):205. doi: 10.1186/s12888-019-2171-y.

Sarcoidosis and Work Participation: The Need to Develop a Disease-Specific Core Set for Assessment of Work Ability.

Lung. 2019 Aug;197(4):407-413. doi: 10.1007/s00408-019-00234-3. Epub 2019 May 17.

Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.

BMJ. 2017 Jan 25;356:j14. doi: 10.1136/bmj.j14.

Use of a structured functional evaluation process for independent medical evaluations of claimants presenting with disabling mental illness: rationale and design for a multi-center reliability study.

BMC Psychiatry. 2016 Jul 29;16:271. doi: 10.1186/s12888-016-0967-6.

Feasibility of Virtual Research Visits in Fox Trial Finder.

J Parkinsons Dis. 2015;5(3):505-15. doi: 10.3233/JPD-150549.

Predictive Value of Baseline Electronic Columbia-Suicide Severity Rating Scale (eC-SSRS) Assessments for Identifying Risk of Prospective Reports of Suicidal Behavior During Research Participation.

Innov Clin Neurosci. 2014 Sep;11(9-10):23-31.

The computerized adaptive diagnostic test for major depressive disorder (CAD-MDD): a screening tool for depression.

J Clin Psychiatry. 2013 Jul;74(7):669-74. doi: 10.4088/JCP.12m08338.

A meta-analysis of factors impacting detection of antidepressant efficacy in clinical trials: the importance of academic sites.

Neuropsychopharmacology. 2012 Dec;37(13):2830-6. doi: 10.1038/npp.2012.153. Epub 2012 Aug 22.

Int J Neuropsychopharmacol. 2012 Aug;15(7):1003-14. doi: 10.1017/S1461145711001738. Epub 2012 Jan 5.

Randomized controlled trials in schizophrenia: opportunities, limitations, and trial design alternatives.

Dialogues Clin Neurosci. 2011;13(2):155-72. doi: 10.31887/DCNS.2011.13.2/ccorrell.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

抑郁评分中不可靠性的来源。

Sources of unreliability in depression ratings.

作者信息

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献