Department of Radiology, Gil Medical Center, Gachon University College of Medicine, Incheon, South Korea.
Department of Biomedical Engineering, Gachon University College of Medicine, Incheon, South Korea.
PLoS One. 2021 Feb 19;16(2):e0246472. doi: 10.1371/journal.pone.0246472. eCollection 2021.
This study evaluated the performance of a commercially available deep-learning algorithm (DLA) (Insight CXR, Lunit, Seoul, South Korea) for referable thoracic abnormalities on chest X-ray (CXR) using a consecutively collected multicenter health screening cohort.
A consecutive health screening cohort of participants who underwent both CXR and chest computed tomography (CT) within 1 month was retrospectively collected from three institutions' health care clinics (n = 5,887). Referable thoracic abnormalities were defined as any radiologic findings requiring further diagnostic evaluation or management, including DLA-target lesions of nodule/mass, consolidation, or pneumothorax. We evaluated the diagnostic performance of the DLA for referable thoracic abnormalities using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, and specificity using ground truth based on chest CT (CT-GT). In addition, for CT-GT-positive cases, three independent radiologist readings were performed on CXR and clear visible (when more than two radiologists called) and visible (at least one radiologist called) abnormalities were defined as CXR-GTs (clear visible CXR-GT and visible CXR-GT, respectively) to evaluate the performance of the DLA.
Among 5,887 subjects (4,329 males; mean age 54±11 years), referable thoracic abnormalities were found in 618 (10.5%) based on CT-GT. DLA-target lesions were observed in 223 (4.0%), nodule/mass in 202 (3.4%), consolidation in 31 (0.5%), pneumothorax in one 1 (<0.1%), and DLA-non-target lesions in 409 (6.9%). For referable thoracic abnormalities based on CT-GT, the DLA showed an AUC of 0.771 (95% confidence interval [CI], 0.751-0.791), a sensitivity of 69.6%, and a specificity of 74.0%. Based on CXR-GT, the prevalence of referable thoracic abnormalities decreased, with visible and clear visible abnormalities found in 405 (6.9%) and 227 (3.9%) cases, respectively. The performance of the DLA increased significantly when using CXR-GTs, with an AUC of 0.839 (95% CI, 0.829-0.848), a sensitivity of 82.7%, and s specificity of 73.2% based on visible CXR-GT and an AUC of 0.872 (95% CI, 0.863-0.880, P <0.001 for the AUC comparison of GT-CT vs. clear visible CXR-GT), a sensitivity of 83.3%, and a specificity of 78.8% based on clear visible CXR-GT.
The DLA provided fair-to-good stand-alone performance for the detection of referable thoracic abnormalities in a multicenter consecutive health screening cohort. The DLA showed varied performance according to the different methods of ground truth.
本研究使用连续采集的多中心健康筛查队列,评估一种商用深度学习算法(DLA)(Insight CXR,Lunit,韩国首尔)在胸部 X 线(CXR)上检测有意义的胸部异常的性能。
从三个机构的医疗诊所回顾性收集了一个连续的健康筛查队列,其中参与者在 1 个月内同时接受了 CXR 和胸部计算机断层扫描(CT)(n=5887)。有意义的胸部异常定义为任何需要进一步诊断评估或管理的影像学发现,包括 DLA 目标病变(结节/肿块、实变或气胸)。我们使用基于胸部 CT 的地面实况(CT-GT)评估了 DLA 对有意义的胸部异常的诊断性能,使用受试者工作特征(ROC)曲线下面积(AUC)、敏感性和特异性。此外,对于 CT-GT 阳性病例,对 CXR 进行了三名独立放射科医生的阅读,将明确可见(当两名以上放射科医生称有异常时)和可见(至少有一名放射科医生称有异常)的异常定义为 CXR-GT(明确可见 CXR-GT 和可见 CXR-GT),以评估 DLA 的性能。
在 5887 名受试者中(4329 名男性;平均年龄 54±11 岁),根据 CT-GT 发现 618 名(10.5%)有意义的胸部异常。DLA 目标病变见于 223 例(4.0%),结节/肿块见于 202 例(3.4%),实变见于 31 例(0.5%),气胸见于 1 例(<0.1%),DLA 非目标病变见于 409 例(6.9%)。对于基于 CT-GT 的有意义的胸部异常,DLA 的 AUC 为 0.771(95%置信区间,0.751-0.791),敏感性为 69.6%,特异性为 74.0%。基于 CXR-GT,有意义的胸部异常的患病率降低,可见异常和明确可见异常分别见于 405 例(6.9%)和 227 例(3.9%)。当使用 CXR-GT 时,DLA 的性能显著提高,可见 CXR-GT 的 AUC 为 0.839(95%置信区间,0.829-0.848),敏感性为 82.7%,特异性为 73.2%,明确可见 CXR-GT 的 AUC 为 0.872(95%置信区间,0.863-0.880,P<0.001 用于 GT-CT 与明确可见 CXR-GT 的 AUC 比较),敏感性为 83.3%,特异性为 78.8%。
DLA 在多中心连续健康筛查队列中对有意义的胸部异常的检测提供了良好的独立性能。DLA 根据不同的地面实况方法表现出不同的性能。