Zidan Marwan, Thomas Ronald L, Slovis Thomas L
Children's Research Center of Michigan, Department of Pediatrics, Wayne State University School of Medicine, 3901 Beaubien Blvd., Detroit, MI, 48201, USA,
Pediatr Radiol. 2015 Mar;45(3):317-28. doi: 10.1007/s00247-014-2944-x. Epub 2015 Mar 1.
The foundation for the usefulness of any diagnostic test should be that it is both reliable and accurate in its clinical diagnosis. In this article we present the second of a two-part series on validity and reliability, discussing the assessment of reliability among raters of diagnostic tests and between diagnostics tests themselves. To examine reproducibility (reliability) among raters of diagnostic tests we present the calculation of two statistical procedures: (1) the kappa coefficient statistic when presented with categorical data for the presence or absence of a clinical diagnosis and (2) the intraclass correlation coefficient (ICC) for continuously scaled data among raters. The accuracy among diagnostic tests (i.e. their interchangeability) can be evaluated by application of (1) a Bland-Altman plot procedure (with its 95% limits of agreement) and (2) the Passing-Bablok regression procedure (for the identification and evaluation of systematic and proportional differences). When deciding whether to select a diagnostic test one must evaluate its ability to provide more precise information than a gold standard test, and whether in clinical practice it would be more beneficial for patients to adopt it.
任何诊断测试有用性的基础都应该是其在临床诊断中既可靠又准确。在本文中,我们呈现关于效度和信度的系列文章的第二篇,讨论诊断测试评分者之间以及诊断测试本身之间的信度评估。为了检验诊断测试评分者之间的可重复性(信度),我们介绍两种统计方法的计算:(1)当呈现临床诊断存在或不存在的分类数据时的kappa系数统计量,以及(2)评分者之间连续尺度数据的组内相关系数(ICC)。诊断测试之间的准确性(即它们的互换性)可以通过应用(1)Bland-Altman图程序(及其95%一致性界限)和(2)Passing-Bablok回归程序(用于识别和评估系统差异和比例差异)来评估。在决定是否选择一种诊断测试时,必须评估其提供比金标准测试更精确信息的能力,以及在临床实践中采用它对患者是否更有益。