Lee Sun Yeop, Ha Sangwoo, Jeon Min Gyeong, Li Hao, Choi Hyunju, Kim Hwa Pyung, Choi Ye Ra, I Hoseok, Jeong Yeon Joo, Park Yoon Ha, Ahn Hyemin, Hong Sang Hyup, Koo Hyun Jung, Lee Choong Wook, Kim Min Jae, Kim Yeon Joo, Kim Kyung Won, Choi Jong Mun
Department of Medical Artificial Intelligence, Deepnoid, Inc., Seoul, Republic of Korea.
Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea.
NPJ Digit Med. 2022 Jul 30;5(1):107. doi: 10.1038/s41746-022-00658-x.
While many deep-learning-based computer-aided detection systems (CAD) have been developed and commercialized for abnormality detection in chest radiographs (CXR), their ability to localize a target abnormality is rarely reported. Localization accuracy is important in terms of model interpretability, which is crucial in clinical settings. Moreover, diagnostic performances are likely to vary depending on thresholds which define an accurate localization. In a multi-center, stand-alone clinical trial using temporal and external validation datasets of 1,050 CXRs, we evaluated localization accuracy, localization-adjusted discrimination, and calibration of a commercially available deep-learning-based CAD for detecting consolidation and pneumothorax. The CAD achieved image-level AUROC (95% CI) of 0.960 (0.945, 0.975), sensitivity of 0.933 (0.899, 0.959), specificity of 0.948 (0.930, 0.963), dice of 0.691 (0.664, 0.718), moderate calibration for consolidation, and image-level AUROC of 0.978 (0.965, 0.991), sensitivity of 0.956 (0.923, 0.978), specificity of 0.996 (0.989, 0.999), dice of 0.798 (0.770, 0.826), moderate calibration for pneumothorax. Diagnostic performances varied substantially when localization accuracy was accounted for but remained high at the minimum threshold of clinical relevance. In a separate trial for diagnostic impact using 461 CXRs, the causal effect of the CAD assistance on clinicians' diagnostic performances was estimated. After adjusting for age, sex, dataset, and abnormality type, the CAD improved clinicians' diagnostic performances on average (OR [95% CI] = 1.73 [1.30, 2.32]; p < 0.001), although the effects varied substantially by clinical backgrounds. The CAD was found to have high stand-alone diagnostic performances and may beneficially impact clinicians' diagnostic performances when used in clinical settings.
虽然已经开发出许多基于深度学习的计算机辅助检测系统(CAD)并将其商业化用于胸部X光片(CXR)中的异常检测,但它们定位目标异常的能力却鲜有报道。就模型的可解释性而言,定位准确性很重要,而这在临床环境中至关重要。此外,诊断性能可能会因定义准确定位的阈值而异。在一项使用1050张CXR的时间和外部验证数据集的多中心独立临床试验中,我们评估了一种用于检测实变和气胸的商用基于深度学习的CAD的定位准确性、定位调整后的辨别力和校准情况。该CAD在图像层面的受试者工作特征曲线下面积(AUROC,95%置信区间)为0.960(0.945,0.975),灵敏度为0.933(0.899,0.959),特异度为0.948(0.930,0.963),骰子系数为0.691(0.664,0.718),对实变的校准适度;图像层面的AUROC为0.978(0.965,0.991),灵敏度为0.956(0.923,0.978),特异度为0.996(0.989,0.999),骰子系数为0.798(0.770,0.826),对气胸的校准适度。当考虑定位准确性时,诊断性能有很大差异,但在临床相关的最低阈值下仍保持较高水平。在一项使用461张CXR的诊断影响单独试验中,估计了CAD辅助对临床医生诊断性能的因果效应。在调整年龄、性别、数据集和异常类型后,CAD平均提高了临床医生的诊断性能(比值比[95%置信区间]=1.73[1.30,2.32];p<0.001),尽管其效果因临床背景而异。研究发现,该CAD具有较高的独立诊断性能,在临床环境中使用时可能会对临床医生的诊断性能产生有益影响。