University Health Network, Toronto, ON, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Vector Institute, Toronto, ON, Canada.
AI Lab, Lenovo Research, Beijing, China.
Lancet Digit Health. 2024 Nov;6(11):e815-e826. doi: 10.1016/S2589-7500(24)00154-7.
Deep learning has shown great potential to automate abdominal organ segmentation and quantification. However, most existing algorithms rely on expert annotations and do not have comprehensive evaluations in real-world multinational settings. To address these limitations, we organised the FLARE 2022 challenge to benchmark fast, low-resource, and accurate abdominal organ segmentation algorithms. We first constructed an intercontinental abdomen CT dataset from more than 50 clinical research groups. We then independently validated that deep learning algorithms achieved a median dice similarity coefficient (DSC) of 90·0% (IQR 87·4-91·3%) by use of 50 labelled images and 2000 unlabelled images, which can substantially reduce manual annotation costs. The best-performing algorithms successfully generalised to holdout external validation sets, achieving a median DSC of 89·4% (85·2-91·3%), 90·0% (84·3-93·0%), and 88·5% (80·9-91·9%) on North American, European, and Asian cohorts, respectively. These algorithms show the potential to use unlabelled data to boost performance and alleviate annotation shortages for modern artificial intelligence models.
深度学习在自动进行腹部器官分割和量化方面显示出巨大的潜力。然而,大多数现有算法依赖于专家注释,并且在真实的多国家环境中没有全面的评估。为了解决这些限制,我们组织了 FLARE 2022 挑战赛,以基准快速、低资源和准确的腹部器官分割算法。我们首先从 50 多个临床研究小组构建了一个洲际腹部 CT 数据集。然后,我们通过使用 50 个标注图像和 2000 个未标注图像独立验证,深度学习算法实现了中位数骰子相似系数(DSC)为 90.0%(IQR 87.4-91.3%),这可以大大降低手动注释成本。表现最好的算法成功地推广到外部验证集,在北美、欧洲和亚洲队列中分别实现了中位数 DSC 为 89.4%(85.2-91.9%)、90.0%(84.3-93.0%)和 88.5%(80.9-91.9%)。这些算法表明,它们有可能利用未标注的数据来提高性能,并缓解现代人工智能模型的标注短缺问题。