实验室观察者绩效研究中的二元和多类别评级：一项比较

Binary and multi-category ratings in a laboratory observer performance study: a comparison.

作者信息

Gur David, Bandos Andriy I, King Jill L, Klym Amy H, Cohen Cathy S, Hakim Christiane M, Hardesty Lara A, Ganott Marie A, Perrin Ronald L, Poller William R, Shah Ratan, Sumkin Jules H, Wallace Luisa P, Rockette Howard E

机构信息

Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA.

出版信息

Med Phys. 2008 Oct;35(10):4404-9. doi: 10.1118/1.2977766.

DOI:10.1118/1.2977766

PMID:18975686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2627510/

Abstract

The authors investigated radiologists, performances during retrospective interpretation of screening mammograms when using a binary decision whether to recall a woman for additional procedures or not and compared it with their receiver operating characteristic (ROC) type performance curves using a semi-continuous rating scale. Under an Institutional Review Board approved protocol nine experienced radiologists independently rated an enriched set of 155 examinations that they had not personally read in the clinic, mixed with other enriched sets of examinations that they had individually read in the clinic, using both a screening BI-RADS rating scale (recall/not recall) and a semi-continuous ROC type rating scale (0 to 100). The vertical distance, namely the difference in sensitivity levels at the same specificity levels, between the empirical ROC curve and the binary operating point were computed for each reader. The vertical distance averaged over all readers was used to assess the proximity of the performance levels under the binary and ROC-type rating scale. There does not appear to be any systematic tendency of the readers towards a better performance when using either of the two rating approaches, namely four readers performed better using the semi-continuous rating scale, four readers performed better with the binary scale, and one reader had the point exactly on the empirical ROC curve. Only one of the nine readers had a binary "operating point" that was statistically distant from the same reader's empirical ROC curve. Reader-specific differences ranged from -0.046 to 0.128 with an average width of the corresponding 95% confidence intervals of 0.2 and p-values ranging for individual readers from 0.050 to 0.966. On average, radiologists performed similarly when using the two rating scales in that the average distance between the run in individual reader's binary operating point and their ROC curve was close to zero. The 95% confidence interval for the fixed-reader average (0.016) was (-0.0206, 0.0631) (two-sided p-value 0.35). In conclusion the authors found that in retrospective observer performance studies the use of a binary response or a semi-continuous rating scale led to consistent results in terms of performance as measured by sensitivity-specificity operating points.

摘要

作者们调查了放射科医生在回顾性解读筛查乳腺X线照片时的表现，此时采用二元决策（即决定是否召回女性进行额外检查），并将其与使用半连续评分量表得出的接受者操作特征（ROC）类型的表现曲线进行比较。在机构审查委员会批准的方案下，九名经验丰富的放射科医生独立对一组丰富的155例检查进行评分，这些检查他们在诊所中并未亲自阅片，而是与他们在诊所中各自阅过的其他丰富检查集混合在一起，同时使用筛查BI-RADS评分量表（召回/不召回）和半连续ROC类型评分量表（0至100）。计算每个读者的经验ROC曲线与二元操作点之间的垂直距离，即在相同特异性水平下的灵敏度水平差异。所有读者的垂直距离平均值用于评估二元评分量表和ROC类型评分量表下表现水平的接近程度。当使用两种评分方法中的任何一种时，读者似乎都没有表现出朝着更好表现的任何系统倾向，即四名读者使用半连续评分量表表现更好，四名读者使用二元量表表现更好，一名读者的点恰好位于经验ROC曲线上。九名读者中只有一名的二元“操作点”与同一读者的经验ROC曲线在统计学上有显著差异。读者特异性差异范围为-0.046至0.128，相应95%置信区间的平均宽度为0.2，各个读者的p值范围为0.050至0.966。平均而言，放射科医生在使用两种评分量表时表现相似，因为各个读者的二元操作点与他们的ROC曲线之间的平均距离接近零。固定读者平均值的95%置信区间（0.016）为（-0.0206，0.0631）（双侧p值0.35）。总之，作者发现，在回顾性观察者表现研究中，使用二元反应或半连续评分量表在通过灵敏度-特异性操作点衡量的表现方面产生了一致的结果。

相似文献

Binary and multi-category ratings in a laboratory observer performance study: a comparison.实验室观察者绩效研究中的二元和多类别评级：一项比较

Med Phys. 2008 Oct;35(10):4404-9. doi: 10.1118/1.2977766.

Is an ROC-type response truly always better than a binary response in observer performance studies?在观察者性能研究中，ROC 型反应真的总是优于二项反应吗？

Acad Radiol. 2010 May;17(5):639-45. doi: 10.1016/j.acra.2009.12.012. Epub 2010 Mar 16.

Agreement of the order of overall performance levels under different reading paradigms.不同阅读范式下总体表现水平顺序的一致性。

Acad Radiol. 2008 Dec;15(12):1567-73. doi: 10.1016/j.acra.2008.07.011.

Comparative statistical properties of expected utility and area under the ROC curve for laboratory studies of observer performance in screening mammography.乳腺钼靶筛查中观察者性能实验室研究的预期效用与ROC曲线下面积的比较统计特性。

Acad Radiol. 2014 Apr;21(4):481-90. doi: 10.1016/j.acra.2013.12.011.

Digital breast tomosynthesis: observer performance study.数字乳腺断层合成：观察者性能研究。

AJR Am J Roentgenol. 2009 Aug;193(2):586-91. doi: 10.2214/AJR.08.2031.

Computer-aided detection of masses at mammography: interactive decision support versus prompts.计算机辅助检测乳腺 X 线摄影中的肿块：交互式决策支持与提示。

Radiology. 2013 Jan;266(1):123-9. doi: 10.1148/radiol.12120218. Epub 2012 Oct 22.

Quasi-continuous and discrete confidence rating scales for observer performance studies: Effects on ROC analysis.用于观察者表现研究的准连续和离散信心评级量表：对ROC分析的影响。

Acad Radiol. 2007 Jan;14(1):38-48. doi: 10.1016/j.acra.2006.09.048.

The "laboratory" effect: comparing radiologists' performance and variability during prospective clinical and laboratory mammography interpretations.“实验室”效应：比较放射科医生在临床前瞻性乳腺钼靶检查和实验室乳腺钼靶检查解读过程中的表现及变异性。

Radiology. 2008 Oct;249(1):47-53. doi: 10.1148/radiol.2491072025. Epub 2008 Aug 5.

Breast lesion detection and classification: comparison of screen-film mammography and full-field digital mammography with soft-copy reading--observer performance study.乳腺病变的检测与分类：屏-片乳腺摄影与软读片全视野数字化乳腺摄影的比较——观察者性能研究

Radiology. 2005 Oct;237(1):37-44. doi: 10.1148/radiol.2371041605. Epub 2005 Aug 11.

Investigation of optimal use of computer-aided detection systems: the role of the "machine" in decision making process.探讨计算机辅助检测系统的最佳使用方式：“机器”在决策过程中的作用。

Acad Radiol. 2010 Sep;17(9):1112-21. doi: 10.1016/j.acra.2010.04.010. Epub 2010 Jun 3.

引用本文的文献

Stereoscopic interpretation of low-dose breast tomosynthesis projection images.低剂量乳腺断层合成投影图像的立体解读

J Digit Imaging. 2014 Apr;27(2):248-54. doi: 10.1007/s10278-013-9648-x.

Imaging technology and practice assessments: what next?成像技术与实践评估：接下来会怎样？

Acad Radiol. 2009 May;16(5):638-40. doi: 10.1016/j.acra.2008.11.013.

Agreement of the order of overall performance levels under different reading paradigms.不同阅读范式下总体表现水平顺序的一致性。

Acad Radiol. 2008 Dec;15(12):1567-73. doi: 10.1016/j.acra.2008.07.011.

本文引用的文献

Radiology. 2008 Oct;249(1):47-53. doi: 10.1148/radiol.2491072025. Epub 2008 Aug 5.

On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data.关于使用FROC类型数据区分实际阴性和实际阳性受试者的方法比较。

Med Phys. 2008 Apr;35(4):1547-58. doi: 10.1118/1.2890410.

"Binary" and "non-binary" detection tasks: are current performance measures optimal?“二元”和“非二元”检测任务：当前的性能指标是否最佳？

Acad Radiol. 2007 Jul;14(7):871-6. doi: 10.1016/j.acra.2007.03.014.

Assessment of medical imaging systems and computer aids: a tutorial review.医学成像系统与计算机辅助设备评估：教程综述

Acad Radiol. 2007 Jun;14(6):723-48. doi: 10.1016/j.acra.2007.03.001.

A search model and figure of merit for observer data acquired according to the free-response paradigm.一种针对根据自由反应范式获取的观察者数据的搜索模型及品质因数。

Phys Med Biol. 2006 Jul 21;51(14):3449-62. doi: 10.1088/0031-9155/51/14/012. Epub 2006 Jul 6.

Variation in false-positive rates of mammography reading among 1067 radiologists: a population-based assessment.1067名放射科医生乳腺X线摄影读片假阳性率的差异：一项基于人群的评估。

Breast Cancer Res Treat. 2006 Dec;100(3):309-18. doi: 10.1007/s10549-006-9252-6. Epub 2006 Jul 4.

Bias in research studies.研究中的偏倚。

Radiology. 2006 Mar;238(3):780-9. doi: 10.1148/radiol.2383041109.

Pulmonary nodules: estimation of malignancy at thin-section helical CT--effect of computer-aided diagnosis on performance of radiologists.肺结节：薄层螺旋CT对恶性肿瘤的评估——计算机辅助诊断对放射科医生诊断性能的影响

Radiology. 2006 Apr;239(1):276-84. doi: 10.1148/radiol.2383050167. Epub 2006 Feb 7.

Receiver operating characteristic analysis: a primer.受试者工作特征分析：入门指南。

Acad Radiol. 2005 Jul;12(7):909-16. doi: 10.1016/j.acra.2005.04.005.

Detection of lung cancer on radiographs: receiver operating characteristic analyses of radiologists', pulmonologists', and anesthesiologists' performance.X线片上肺癌的检测：放射科医生、肺科医生和麻醉科医生表现的受试者工作特征分析

Radiology. 2004 Dec;233(3):799-805. doi: 10.1148/radiol.2333031478. Epub 2004 Oct 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验