Department of Statistics, SungKyunKwan University, Jongno-gu, South Korea.
Department of Statistics, University of South Carolina, Columbia, South Carolina, USA.
Stat Med. 2021 Jul 30;40(17):4014-4033. doi: 10.1002/sim.9011. Epub 2021 May 9.
Diagnostic tests are frequently reliant upon the interpretation of images by skilled raters. In many clinical settings, however, the variability observed between experts' ratings plays a detrimental role in the degree of confidence in these interpretations, leading to uncertainty in the diagnostic process. For example, in breast cancer testing, radiologists interpret mammographic images, while breast biopsy results are examined by pathologists. Each of these procedures involves elements of subjectivity. We propose here a flexible two-stage Bayesian latent variable model to investigate how the skills of individual raters impact the diagnostic accuracy of image-related testing in large-scale medical testing studies. A strength of the proposed model is that the true disease status of a patient within a reasonable time frame may or may not be known. In these studies, many raters each contribute classifications on a large sample of patients using a defined ordinal grading scale, leading to a complex correlation structure between ratings. Our modeling approach considers the different sources of variability contributed by experts and patients while accounting for correlations present between ratings and patients, in contrast to currently available methods. We propose a novel measure of a rater's ability (magnifier) that, in contrast to conventional measures of sensitivity and specificity, is robust to the underlying prevalence of disease in the population, providing an alternative measure of diagnostic accuracy across patient populations. Extensive simulation studies demonstrate lower bias in estimation of parameters and measures of accuracy, and illustrate outperformance of the proposed model when compared with existing models. Receiver operator characteristic curves are derived to assess the diagnostic accuracy of individual experts and their overall performance. Our proposed modeling approach is applied to a large breast imaging study for known disease status and a uterine cancer dataset for unknown disease status.
诊断测试通常依赖于熟练的评估者对图像的解释。然而,在许多临床环境中,专家评分之间的可变性在这些解释的置信度程度上起着有害的作用,导致诊断过程中的不确定性。例如,在乳腺癌检测中,放射科医生解释乳房 X 光图像,而病理学家则检查乳房活检结果。这些程序都涉及到主观性的元素。在这里,我们提出了一个灵活的两阶段贝叶斯潜在变量模型,以研究个体评估者的技能如何影响大规模医学测试研究中与图像相关的测试的诊断准确性。所提出模型的一个优点是,在合理的时间范围内,患者的真实疾病状态可能未知或已知。在这些研究中,许多评估者使用定义的有序分级量表对大量患者的分类进行分类,导致评分之间存在复杂的相关结构。我们的建模方法考虑了专家和患者贡献的不同来源的变异性,同时考虑了评分和患者之间存在的相关性,与当前可用的方法形成对比。我们提出了一种评估评估者能力的新度量标准(放大镜),与传统的敏感性和特异性度量标准相比,它对人群中疾病的潜在流行率具有鲁棒性,为跨患者群体提供了诊断准确性的替代度量标准。广泛的模拟研究表明,参数和准确性度量的估计偏差较低,并且与现有模型相比,提出的模型表现出更好的性能。绘制了接收器操作特征曲线,以评估单个专家的诊断准确性及其整体性能。我们提出的建模方法应用于具有已知疾病状态的大型乳房成像研究和具有未知疾病状态的子宫癌数据集。