Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.
Department of Psychological and Brain Sciences, University of California, Santa Barbara, CA, USA.
Med Decis Making. 2024 Oct;44(7):828-842. doi: 10.1177/0272989X241264572. Epub 2024 Jul 30.
To develop a model that simulates radiologist assessments and use it to explore whether pairing readers based on their individual performance characteristics could optimize screening performance.
Logistic regression models were designed and used to model individual radiologist assessments. For model evaluation, model-predicted individual performance metrics and paired disagreement rates were compared against the observed data using Pearson correlation coefficients. The logistic regression models were subsequently used to simulate different screening programs with reader pairing based on individual true-positive rates (TPR) and/or false-positive rates (FPR). For this, retrospective results from breast cancer screening programs employing double reading in Sweden, England, and Norway were used. Outcomes of random pairing were compared against those composed of readers with similar and opposite TPRs/FPRs, with positive assessments defined by either reader flagging an examination as abnormal.
The analysis data sets consisted of 936,621 (Sweden), 435,281 (England), and 1,820,053 (Norway) examinations. There was good agreement between the model-predicted and observed radiologists' TPR and FPR ( ≥ 0.969). Model-predicted negative-case disagreement rates showed high correlations ( ≥ 0.709), whereas positive-case disagreement rates had lower correlation levels due to sparse data ( ≥ 0.532). Pairing radiologists with similar FPR characteristics (Sweden: 4.50% [95% confidence interval: 4.46%-4.54%], England: 5.51% [5.47%-5.56%], Norway: 8.03% [7.99%-8.07%]) resulted in significantly lower FPR than with random pairing (Sweden: 4.74% [4.70%-4.78%], England: 5.76% [5.71%-5.80%], Norway: 8.30% [8.26%-8.34%]), reducing examinations sent to consensus/arbitration while the TPR did not change significantly. Other pairing strategies resulted in equal or worse performance than random pairing.
Logistic regression models accurately predicted screening mammography assessments and helped explore different radiologist pairing strategies. Pairing readers with similar modeled FPR characteristics reduced the number of examinations unnecessarily sent to consensus/arbitration without significantly compromising the TPR.
A logistic-regression model can be derived that accurately predicts individual and paired reader performance during mammography screening reading.Pairing screening mammography radiologists with similar false-positive characteristics reduced false-positive rates with no significant loss in true positives and may reduce the number of examinations unnecessarily sent to consensus/arbitration.
开发一种模拟放射科医生评估的模型,并利用该模型探讨根据个体表现特征对读者进行配对是否可以优化筛查性能。
设计了逻辑回归模型来模拟个体放射科医生的评估。为了模型评估,使用皮尔逊相关系数将模型预测的个体性能指标和配对不一致率与观察数据进行比较。随后,使用逻辑回归模型根据个体的真阳性率(TPR)和/或假阳性率(FPR)模拟不同的筛查方案,并进行读者配对。为此,使用了瑞典、英国和挪威乳腺癌筛查项目的回顾性结果。将随机配对的结果与具有相似和相反 TPR/FPR 的读者配对的结果进行比较,阳性评估由任一读者将检查标记为异常来定义。
分析数据集包括 936621 例(瑞典)、435281 例(英国)和 1820053 例(挪威)检查。模型预测的放射科医生 TPR 和 FPR 与观察结果具有很好的一致性(≥0.969)。模型预测的阴性病例不一致率具有很高的相关性(≥0.709),而阳性病例不一致率由于数据稀疏,相关性水平较低(≥0.532)。配对具有相似 FPR 特征的放射科医生(瑞典:4.50%[95%置信区间:4.46%-4.54%],英国:5.51%[5.47%-5.56%],挪威:8.03%[7.99%-8.07%])比随机配对(瑞典:4.74%[4.70%-4.78%],英国:5.76%[5.71%-5.80%],挪威:8.30%[8.26%-8.34%])显著降低了 FPR,同时 TPR 没有显著变化。其他配对策略的表现与随机配对相同或更差。
逻辑回归模型准确预测了筛查乳房 X 线摄影评估,并有助于探讨不同的放射科医生配对策略。配对具有相似模型化 FPR 特征的读者可以减少不必要地发送到共识/仲裁的检查数量,而不会显著降低 TPR。
可以得出一个能够准确预测乳房 X 线摄影筛查阅读中个体和配对读者表现的逻辑回归模型。
配对乳房 X 线摄影筛查放射科医生具有相似的假阳性特征,可以降低假阳性率,而不会显著降低真阳性率,并可能减少不必要地发送到共识/仲裁的检查数量。