Department of Radiology, University of Pittsburgh, Presbyterian South Tower, Room 4771, 200 Lothrop Street, Pittsburgh, PA 15213, USA.
Acad Radiol. 2012 Dec;19(12):1474-83. doi: 10.1016/j.acra.2012.09.002. Epub 2012 Oct 4.
Studies of medical image interpretation have focused on either assessing radiologists' performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing their eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures of merit (FOMs) analogous to the area under the receiver operating characteristic (ROC) curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between FOMs derived from free-response ROC (FROC) and ET data.
Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800 ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap.
The AFROC mark-ratings' FOM was largest at 0.734 (CI 0.65-0.81) followed by the dwell at 0.460 (0.34-0.59) and then by the approach-rate FOM 0.336 (0.25-0.46). The differences between the FROC mark-ratings' FOM and the perceptual FOMs were significant (P < .05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63-0.88), dwell vs. approach-rate 0.703 (0.60-0.79) and rating vs. approach-rate 0.606 (0.53-0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (P = .008).
Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.
医学影像解读的研究主要集中在使用例如接收者操作特性(ROC)范式评估放射科医生的表现,或者通过分析他们的眼动追踪(ET)数据来评估解释过程。ET 数据的分析没有受益于类似于接收者操作特性(ROC)曲线下面积的无偏差阈值的优点(FOMs)。目的是展示这些 FOMs 的可行性,并测量来自自由响应 ROC(FROC)和 ET 数据的 FOMs 之间的一致性。
八名经验丰富的乳腺放射科医生在解释一套 120 张双视图乳房 X 光片的同时,连续收集眼位数据和 FROC 数据。被认为吸引长时间(>800 毫秒)视觉注意力的区域被视为虚拟标记,并根据驻留时间和接近率(击中时间的倒数)对其进行评分。虚拟评分用于以类似于梯形替代 FROC(AFROC)曲线下面积的方式定义无偏差阈值的 FOM(0=最差,1=最佳)。使用 jackknife 和 95%置信区间(CI)测量案例级别的一致性(0.5=机会,1=完美),并使用引导法估计一致性。
AFROC 标记评分的 FOM 最大为 0.734(0.65-0.81),其次是驻留时间的 FOM 为 0.460(0.34-0.59),然后是接近率的 FOM 为 0.336(0.25-0.46)。FROC 标记评分的 FOM 和感知 FOM 之间的差异具有统计学意义(P<.05)。所有成对的一致性均显著优于机会水平:评分与驻留时间的一致性为 0.707(0.63-0.88),驻留时间与接近率的一致性为 0.703(0.60-0.79),评分与接近率的一致性为 0.606(0.53-0.68)。评分与接近率的一致性明显小于驻留时间与接近率的一致性(P=0.008)。
利用当前开发的用于分析观察者性能数据的方法可以补充当前分析 ET 数据的方法,并提供新的见解。