Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute/Harvard Medical School, Boston, Massachusetts.
Laboratory for Innovation Science at Harvard, Harvard University, Boston, Massachusetts.
JAMA Oncol. 2019 May 1;5(5):654-661. doi: 10.1001/jamaoncol.2019.0159.
Radiation therapy (RT) is a critical cancer treatment, but the existing radiation oncologist work force does not meet growing global demand. One key physician task in RT planning involves tumor segmentation for targeting, which requires substantial training and is subject to significant interobserver variation.
To determine whether crowd innovation could be used to rapidly produce artificial intelligence (AI) solutions that replicate the accuracy of an expert radiation oncologist in segmenting lung tumors for RT targeting.
DESIGN, SETTING, AND PARTICIPANTS: We conducted a 10-week, prize-based, online, 3-phase challenge (prizes totaled $55 000). A well-curated data set, including computed tomographic (CT) scans and lung tumor segmentations generated by an expert for clinical care, was used for the contest (CT scans from 461 patients; median 157 images per scan; 77 942 images in total; 8144 images with tumor present). Contestants were provided a training set of 229 CT scans with accompanying expert contours to develop their algorithms and given feedback on their performance throughout the contest, including from the expert clinician.
The AI algorithms generated by contestants were automatically scored on an independent data set that was withheld from contestants, and performance ranked using quantitative metrics that evaluated overlap of each algorithm's automated segmentations with the expert's segmentations. Performance was further benchmarked against human expert interobserver and intraobserver variation.
A total of 564 contestants from 62 countries registered for this challenge, and 34 (6%) submitted algorithms. The automated segmentations produced by the top 5 AI algorithms, when combined using an ensemble model, had an accuracy (Dice coefficient = 0.79) that was within the benchmark of mean interobserver variation measured between 6 human experts. For phase 1, the top 7 algorithms had average custom segmentation scores (S scores) on the holdout data set ranging from 0.15 to 0.38, and suboptimal performance using relative measures of error. The average S scores for phase 2 increased to 0.53 to 0.57, with a similar improvement in other performance metrics. In phase 3, performance of the top algorithm increased by an additional 9%. Combining the top 5 algorithms from phase 2 and phase 3 using an ensemble model, yielded an additional 9% to 12% improvement in performance with a final S score reaching 0.68.
A combined crowd innovation and AI approach rapidly produced automated algorithms that replicated the skills of a highly trained physician for a critical task in radiation therapy. These AI algorithms could improve cancer care globally by transferring the skills of expert clinicians to under-resourced health care settings.
放射治疗(RT)是癌症治疗的关键手段,但现有的放射肿瘤学家劳动力无法满足日益增长的全球需求。在 RT 计划中,医生的一项关键任务是进行肿瘤分割以确定靶区,这需要大量的培训,并且存在显著的观察者间差异。
确定众包创新是否可用于快速开发人工智能(AI)解决方案,以复制专家放射肿瘤学家在为 RT 靶向定位进行肺肿瘤分割方面的准确性。
设计、设置和参与者:我们进行了一项为期 10 周、基于奖励的在线 3 阶段挑战赛(总奖金为 55000 美元)。使用精心策划的数据集,包括用于临床护理的计算机断层扫描(CT)扫描和由专家生成的肺肿瘤分割(来自 461 名患者的 CT 扫描;中位数每个扫描 157 张图像;共 77942 张图像;77942 张图像中有 8144 张有肿瘤)。参赛选手获得了 229 张 CT 扫描的训练集,并附有专家轮廓,以开发他们的算法,并在整个比赛中获得有关其表现的反馈,包括来自专家临床医生的反馈。
参赛选手开发的 AI 算法在一个独立的数据集中自动评分,该数据集对参赛选手保密,并使用评估每个算法的自动分割与专家分割重叠的定量指标对性能进行排名。性能进一步与人类专家的观察者间和观察者内变异进行基准测试。
共有来自 62 个国家的 564 名选手注册参加了此次挑战赛,其中 34 名(6%)提交了算法。使用集成模型组合使用的前 5 种 AI 算法的自动分割的准确性(Dice 系数=0.79)在 6 名人类专家之间测量的平均观察者间变异的基准范围内。在第 1 阶段,前 7 种算法在保留数据集中的平均自定义分割得分(S 得分)范围为 0.15 至 0.38,使用相对误差度量的性能欠佳。第 2 阶段的平均 S 得分提高到 0.53 至 0.57,其他性能指标也有类似的改善。在第 3 阶段,顶级算法的性能提高了 9%。使用集成模型组合第 2 阶段和第 3 阶段的前 5 种算法,性能提高了 9%至 12%,最终 S 得分达到 0.68。
结合众包创新和人工智能的方法,迅速开发出自动化算法,复制了高训练有素的医生在放射治疗中一项关键任务的技能。这些 AI 算法可以通过将专家临床医生的技能转移到资源匮乏的医疗保健环境中,从而改善全球癌症治疗水平。