Department of Radiology, Seoul National University Hospital and College of Medicine, Seoul, 03080, Republic of Korea.
Department of Radiology, Ewha Womans University Seoul Hospital, Seoul, 07804, Republic of Korea.
Eur Radiol. 2022 Jan;32(1):213-222. doi: 10.1007/s00330-021-08162-8. Epub 2021 Jul 15.
To explore the value of a deep learning-based algorithm in detecting Lung CT Screening Reporting and Data System category 4 nodules on chest radiographs from an asymptomatic health checkup population.
Data from an annual retrospective cohort of individuals who underwent chest radiographs for health checkup purposes and chest CT scanning within 3 months were collected. Among 3073 individuals, 118 with category 4 nodules on CT were selected. A reader performance test was performed using those 118 radiographs and randomly selected 51 individuals without any nodules. Four radiologists independently evaluated the radiographs without and with the results of the algorithm; and sensitivities/specificities were compared. The sample size needed to confirm the difference in detection rates was calculated, i.e., the number of true-positive radiographs divided by the total number of radiographs.
The sensitivity of the radiologists substantially increased aided by the algorithm (38.8% [183/472] to 45.1% [213/472]; p < .001) without significant change in specificity (94.1% [192/204] vs. 92.2% [188/204]; p = .22). Pooled radiologists detected more nodules with the algorithm (32.0% [156/488] vs. 38.9% [190/488]; p < .001), without alteration of false-positive rates (0.09 [62/676], both). Pooled detection rates for the annual cohort were 1.49% (183/12,292) and 1.73% (213/12,292) without and with the algorithm, respectively. A sample size of 41,776 in each arm would be required to demonstrate significant detection rate difference with < 5% type I error and > 80% power.
Although readers substantially increased sensitivity in detecting nodules on chest radiographs from a health checkup population aided by the algorithm, detection rate difference was only 0.24%, requiring a sample size >80,000 for a randomized controlled trial.
• Aided by a deep learning algorithm, pooled radiologists improved their sensitivity in detecting Lung-RADS category 4 nodules on chest radiographs from a health checkup population (38.8% [183/472] to 45.1% [213/472]; p < .001), without increasing false-positive rate. • The prevalence of the Lung-RADS category 4 nodules was 3.8% (118/3073) on the population, resulting in only 0.24% increase of the detection rate for the radiologists with assistance of the algorithm. • To confirm the significant detection rate increase by a randomized controlled trial, a sample size of 84,000 would be required.
探讨深度学习算法在检测无症状健康体检人群胸部 X 线片上 Lung CT Screening Reporting and Data System 分类 4 结节中的应用价值。
收集了一个年度回顾性队列中进行胸部 X 线检查和 3 个月内胸部 CT 扫描的个体的数据。在 3073 名个体中,选择了 118 名 CT 分类 4 结节的个体。使用这些 118 张 X 光片和随机选择的 51 名无任何结节的个体进行了读者性能测试。4 名放射科医生独立评估了 X 光片,有无算法结果;并比较了敏感性/特异性。计算了确认检出率差异所需的样本量,即真阳性 X 光片数除以总 X 光片数。
放射科医生的敏感性显著提高(无算法时为 38.8%[183/472],有算法时为 45.1%[213/472];p<0.001),特异性无显著变化(无算法时为 94.1%[192/204],有算法时为 92.2%[188/204];p=0.22)。有了算法,汇总的放射科医生检测到更多的结节(无算法时为 32.0%[156/488],有算法时为 38.9%[190/488];p<0.001),而假阳性率没有改变(无算法时为 0.09[62/676],有算法时为 0.09[62/676])。年度队列的总体检出率分别为 1.49%(183/12,292)和 1.73%(213/12,292),有无算法。在每一侧臂都需要 41,776 个样本才能在 < 5%的 I 型错误和 > 80%的功率下证明有显著的检出率差异。
尽管在检测健康体检人群胸部 X 光片上的结节时,算法能显著提高放射科医生的敏感性,但检出率差异仅为 0.24%,需要 >80,000 个样本进行随机对照试验。
借助深度学习算法,汇总的放射科医生提高了他们在检测健康体检人群胸部 X 光片上 Lung-RADS 分类 4 结节的敏感性(无算法时为 38.8%[183/472],有算法时为 45.1%[213/472];p<0.001),而没有增加假阳性率。
在该人群中,Lung-RADS 分类 4 结节的患病率为 3.8%(118/3073),因此,放射科医生在有算法协助的情况下,检出率仅增加了 0.24%。
要通过随机对照试验证实有显著的检出率增加,需要 84,000 个样本。