Mitchell Joseph Ross, Kamnitsas Konstantinos, Singleton Kyle W, Whitmire Scott A, Clark-Swanson Kamala R, Ranjbar Sara, Rickertsen Cassandra R, Johnston Sandra K, Egan Kathleen M, Rollison Dana E, Arrington John, Krecke Karl N, Passe Theodore J, Verdoorn Jared T, Nagelschneider Alex A, Carr Carrie M, Port John D, Patton Alice, Campeau Norbert G, Liebo Greta B, Eckel Laurence J, Wood Christopher P, Hunt Christopher H, Vibhute Prasanna, Nelson Kent D, Hoxworth Joseph M, Patel Ameet C, Chong Brian W, Ross Jeffrey S, Boxerman Jerrold L, Vogelbaum Michael A, Hu Leland S, Glocker Ben, Swanson Kristin R
H. Lee Moffitt Cancer Center and Research Institute, Department of Machine Learning, Tampa, Florida, United States.
Imperial College, Biomedical Image Analysis Group, London, United Kingdom.
J Med Imaging (Bellingham). 2020 Sep;7(5):055501. doi: 10.1117/1.JMI.7.5.055501. Epub 2020 Oct 16.
Deep learning (DL) algorithms have shown promising results for brain tumor segmentation in MRI. However, validation is required prior to routine clinical use. We report the first randomized and blinded comparison of DL and trained technician segmentations. We compiled a multi-institutional database of 741 pretreatment MRI exams. Each contained a postcontrast T1-weighted exam, a T2-weighted fluid-attenuated inversion recovery exam, and at least one technician-derived tumor segmentation. The database included 729 unique patients (470 males and 259 females). Of these exams, 641 were used for training the DL system, and 100 were reserved for testing. We developed a platform to enable qualitative, blinded, controlled assessment of lesion segmentations made by technicians and the DL method. On this platform, 20 neuroradiologists performed 400 side-by-side comparisons of segmentations on 100 test cases. They scored each segmentation between 0 (poor) and 10 (perfect). Agreement between segmentations from technicians and the DL method was also evaluated quantitatively using the Dice coefficient, which produces values between 0 (no overlap) and 1 (perfect overlap). The neuroradiologists gave technician and DL segmentations mean scores of 6.97 and 7.31, respectively ( ). The DL method achieved a mean Dice coefficient of 0.87 on the test cases. This was the first objective comparison of automated and human segmentation using a blinded controlled assessment study. Our DL system learned to outperform its "human teachers" and produced output that was better, on average, than its training data.
深度学习(DL)算法在磁共振成像(MRI)的脑肿瘤分割中已显示出有前景的结果。然而,在常规临床应用之前需要进行验证。我们报告了DL与训练有素的技术人员分割的首次随机双盲比较。我们编制了一个包含741例治疗前MRI检查的多机构数据库。每个数据库都包含一个增强后T1加权检查、一个T2加权液体衰减反转恢复检查以及至少一个由技术人员得出的肿瘤分割。该数据库包括729名独特的患者(470名男性和259名女性)。在这些检查中,641例用于训练DL系统,100例留作测试。我们开发了一个平台,以对技术人员和DL方法所做的病变分割进行定性、双盲、对照评估。在这个平台上,20名神经放射科医生对100个测试病例的分割进行了400次并排比较。他们对每个分割的评分在0(差)到10(完美)之间。还使用Dice系数对技术人员和DL方法的分割之间的一致性进行了定量评估,该系数产生的值在0(无重叠)到1(完美重叠)之间。神经放射科医生给技术人员和DL分割的平均分数分别为6.97和7.31( )。DL方法在测试病例上的平均Dice系数为0.87。这是首次使用双盲对照评估研究对自动分割和人工分割进行的客观比较。我们的DL系统学会了超越其“人类教师”,并且平均产生的输出比其训练数据更好。