University Animal Hospital, Swedish University of Agricultural Sciences, Uppsala, Sweden.
XL Vet AB, Postvägen 7, Örbyhus, Sweden.
Sci Rep. 2022 Aug 17;12(1):13916. doi: 10.1038/s41598-022-18364-9.
Variation in the diagnostic interpretation of radiographs is a well-recognised problem in human and veterinary medicine. One common solution is to create a 'consensus' score based on a majority or unanimous decision from multiple observers. While consensus approaches are generally assumed to improve diagnostic repeatability, the extent to which consensus scores are themselves repeatable has rarely been examined. Here we use repeated assessments by three radiologists of 196 hip radiographs from 98 cats within a health-screening programme to examine intra-observer, inter-observer, majority-consensus and unanimous-consensus repeatability scores for feline hip dysplasia. In line with other studies, intra-observer and inter-observer repeatability was moderate (63-71%), and related to the reference assessment and time taken to reach a decision. Consensus scores did show reduced variation between assessments compared to individuals, but consensus repeatability was far from perfect. Only 75% of majority consensus scores were in agreement between assessments, and based on Bayesian multinomial modelling we estimate that unanimous consensus scores can have repeatabilities as low as 83%. These results clearly show that consensus scores in radiology can have large uncertainties, and that future studies in both human and veterinary medicine need to include consensus-uncertainty estimates if we are to properly interpret radiological diagnoses and the extent to which consensus scores improve diagnostic accuracy.
在人和兽医医学中,放射照片的诊断解释的变化是一个众所周知的问题。一种常见的解决方案是根据多个观察者的多数或一致决定创建“共识”评分。虽然共识方法通常被认为可以提高诊断的可重复性,但共识评分本身的可重复性很少被检查。在这里,我们使用三名放射科医生对健康筛查计划中 98 只猫的 196 张髋关节 X 光片进行的重复评估,来检查猫髋关节发育不良的观察者内、观察者间、多数共识和一致共识的可重复性评分。与其他研究一致,观察者内和观察者间的可重复性为中等(63-71%),并与参考评估和做出决策所需的时间有关。与个体相比,共识评分确实显示出评估之间的变化较小,但共识的可重复性远非完美。只有 75%的多数共识评分在评估之间一致,如果我们要正确解释放射学诊断以及共识评分在多大程度上提高诊断准确性,那么未来的人类和兽医医学研究都需要包括共识不确定性估计。这些结果清楚地表明,放射学中的共识评分可能存在很大的不确定性,如果我们要正确解释放射学诊断以及共识评分在多大程度上提高诊断准确性,那么未来的人类和兽医医学研究都需要包括共识不确定性估计。