Jacobs Colin, Setio Arnaud A A, Scholten Ernst T, Gerke Paul K, Bhattacharya Haimasree, M Hoesein Firdaus A, Brink Monique, Ranschaert Erik, de Jong Pim A, Silva Mario, Geurts Bram, Chung Kaman, Schalekamp Steven, Meersschaert Joke, Devaraj Anand, Pinsky Paul F, Lam Stephen C, van Ginneken Bram, Farahani Keyvan
Department of Radiology, Nuclear Medicine and Anatomy, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, the Netherlands (C.J., A.A.A.S., E.T.S., P.K.G., H.B., M.B., B.G., S.S., B.v.G.); Department of Digital Technology & Innovation, Siemens Healthineers, Erlangen, Germany (A.A.A.S.); Department of Radiology, University Medical Center Utrecht, Utrecht, the Netherlands (F.A.M.H., P.A.d.J.); ETZ (Elisabeth-TweeSteden Ziekenhuis), Tilburg, the Netherlands (E.R.); Section of Radiology, Department of Medicine and Surgery (DiMeC), University of Parma, Parma, Italy (M.S.); Department of Radiology, Meander Medical Center, Amersfoort, the Netherlands (K.C., S.S.); Department of Radiology, AZ Zeno, Knokke-Heist, Belgium (J.M.); Department of Imaging, Royal Brompton Hospital, London, England (A.D.); Division of Cancer Prevention (P.F.P.) and Center for Biomedical Informatics & Information Technology (K.F.), National Cancer Institute, National Institutes of Health, Bethesda, Md; British Columbia Cancer Agency and the University of British Columbia, Vancouver, Canada (S.C.L.); and Fraunhofer MEVIS, Bremen, Germany (B.v.G.).
Radiol Artif Intell. 2021 Oct 27;3(6):e210027. doi: 10.1148/ryai.2021210027. eCollection 2021 Nov.
To determine whether deep learning algorithms developed in a public competition could identify lung cancer on low-dose CT scans with a performance similar to that of radiologists.
In this retrospective study, a dataset consisting of 300 patient scans was used for model assessment; 150 patient scans were from the competition set and 150 were from an independent dataset. Both test datasets contained 50 cancer-positive scans and 100 cancer-negative scans. The reference standard was set by histopathologic examination for cancer-positive scans and imaging follow-up for at least 2 years for cancer-negative scans. The test datasets were applied to the three top-performing algorithms from the Kaggle Data Science Bowl 2017 public competition: grt123, Julian de Wit and Daniel Hammack (JWDH), and Aidence. Model outputs were compared with an observer study of 11 radiologists that assessed the same test datasets. Each scan was scored on a continuous scale by both the deep learning algorithms and the radiologists. Performance was measured using multireader, multicase receiver operating characteristic analysis.
The area under the receiver operating characteristic curve (AUC) was 0.877 (95% CI: 0.842, 0.910) for grt123, 0.902 (95% CI: 0.871, 0.932) for JWDH, and 0.900 (95% CI: 0.870, 0.928) for Aidence. The average AUC of the radiologists was 0.917 (95% CI: 0.889, 0.945), which was significantly higher than grt123 ( = .02); however, no significant difference was found between the radiologists and JWDH ( = .29) or Aidence ( = .26).
Deep learning algorithms developed in a public competition for lung cancer detection in low-dose CT scans reached performance close to that of radiologists. Lung, CT, Thorax, Screening, Oncology © RSNA, 2021.
确定在公开竞赛中开发的深度学习算法能否在低剂量CT扫描中识别肺癌,其性能是否与放射科医生相似。
在这项回顾性研究中,一个由300例患者扫描数据组成的数据集用于模型评估;150例患者扫描数据来自竞赛集,150例来自独立数据集。两个测试数据集均包含50例癌症阳性扫描和100例癌症阴性扫描。癌症阳性扫描的参考标准通过组织病理学检查设定,癌症阴性扫描的参考标准通过至少2年的影像随访设定。将测试数据集应用于2017年Kaggle数据科学碗公开竞赛中表现最佳的三种算法:grt123、朱利安·德·威特和丹尼尔·哈马克(JWDH)以及Aidence。将模型输出结果与11名放射科医生对相同测试数据集的观察性研究结果进行比较。深度学习算法和放射科医生对每幅扫描图像都进行连续评分。使用多读者、多病例接受者操作特征分析来衡量性能。
grt123的接受者操作特征曲线(AUC)下面积为0.877(95%CI:0.842,0.910),JWDH为0.902(95%CI:0.871,0.932),Aidence为0.900(95%CI:0.870,0.928)。放射科医生的平均AUC为0.917(95%CI:0.889,0.945),显著高于grt123(P = .02);然而,放射科医生与JWDH(P = .29)或Aidence(P = .26)之间未发现显著差异。
在公开竞赛中开发的用于低剂量CT扫描肺癌检测的深度学习算法性能接近放射科医生。肺、CT、胸部、筛查、肿瘤学 © RSNA,2021。