From the R&D Center, VUNO, 507 Gangnamdae-ro, Seocho-gu, Seoul 06536, South Korea (J.S., W.B., B.P., E.J., K.H.J.); and Department of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea (S.P., S.M.L., J.B.S.).
Radiology. 2021 May;299(2):450-459. doi: 10.1148/radiol.2021202818. Epub 2021 Mar 23.
Background Previous studies assessing the effects of computer-aided detection on observer performance in the reading of chest radiographs used a sequential reading design that may have biased the results because of reading order or recall bias. Purpose To compare observer performance in detecting and localizing major abnormal findings including nodules, consolidation, interstitial opacity, pleural effusion, and pneumothorax on chest radiographs without versus with deep learning-based detection (DLD) system assistance in a randomized crossover design. Materials and Methods This study included retrospectively collected normal and abnormal chest radiographs between January 2016 and December 2017 (; registration no. KCT0004147) The radiographs were randomized into two groups, and six observers, including thoracic radiologists, interpreted each radiograph without and with use of a commercially available DLD system by using a crossover design with a washout period. Jackknife alternative free-response receiver operating characteristic (JAFROC) figure of merit (FOM), area under the receiver operating characteristic curve (AUC), sensitivity, specificity, false-positive findings per image, and reading times of observers with and without the DLD system were compared by using McNemar and paired tests. Results A total of 114 normal (mean patient age ± standard deviation, 51 years ± 11; 58 men) and 114 abnormal (mean patient age, 60 years ± 15; 75 men) chest radiographs were evaluated. The radiographs were randomized to two groups: group A ( = 114) and group B ( = 114). Use of the DLD system improved the observers' JAFROC FOM (from 0.90 to 0.95, = .002), AUC (from 0.93 to 0.98, = .002), per-lesion sensitivity (from 83% [822 of 990 lesions] to 89.1% [882 of 990 lesions], = .009), per-image sensitivity (from 80% [548 of 684 radiographs] to 89% [608 of 684 radiographs], = .009), and specificity (from 89.3% [611 of 684 radiographs] to 96.6% [661 of 684 radiographs], = .01) and reduced the reading time (from 10-65 seconds to 6-27 seconds, < .001). The DLD system alone outperformed the pooled observers (JAFROC FOM: 0.96 vs 0.90, respectively, = .007; AUC: 0.98 vs 0.93, = .003). Conclusion Observers including thoracic radiologists showed improved performance in the detection and localization of major abnormal findings on chest radiographs and reduced reading time with use of a deep learning-based detection system. © RSNA, 2021
背景 先前评估计算机辅助检测在胸部 X 线阅读中对观察者性能影响的研究采用了顺序阅读设计,这种设计可能会因阅读顺序或回忆偏倚而产生偏差。目的 在随机交叉设计中,比较在没有和有基于深度学习的检测(DLD)系统辅助的情况下,观察者在检测和定位包括结节、实变、间质混浊、胸腔积液和气胸在内的主要异常发现方面的表现。
材料与方法 本研究回顾性收集了 2016 年 1 月至 2017 年 12 月期间的正常和异常胸部 X 线片(注册号:KCT0004147)。将 X 线片随机分为两组,6 名观察者(包括胸部放射科医生)使用商业上可获得的 DLD 系统,以交叉设计并使用洗脱期进行无和有 DLD 系统的阅片。使用 Jackknife 替代自由响应接收者操作特征(JAFROC)优值(FOM)、接收者操作特征曲线下面积(AUC)、敏感性、特异性、每张图像的假阳性发现和有/无 DLD 系统的观察者阅读时间,比较观察者的 JAFROC FOM(从 0.90 提高到 0.95, =.002)、AUC(从 0.93 提高到 0.98, =.002)、病变检出率(从 83%(990 个病变中的 822 个)提高到 89.1%(990 个病变中的 882 个), =.009)、每张图像的检出率(从 80%(684 张 X 线片中的 548 张)提高到 89%(684 张 X 线片中的 608 张), =.009)和特异性(从 89.3%(684 张 X 线片中的 611 张)提高到 96.6%(684 张 X 线片中的 661 张), =.01),并减少了阅读时间(从 10-65 秒减少到 6-27 秒, <.001)。单独使用 DLD 系统的表现优于汇总观察者(JAFROC FOM:0.96 比 0.90, =.007;AUC:0.98 比 0.93, =.003)。
结论 包括胸部放射科医生在内的观察者在检测和定位胸部 X 线片中的主要异常发现方面表现出更好的性能,并且使用基于深度学习的检测系统可以减少阅读时间。
Medicine (Baltimore). 2021-4-23
Tuberc Respir Dis (Seoul). 2025-4