Mehta Raghav, Filos Angelos, Baid Ujjwal, Sako Chiharu, McKinley Richard, Rebsamen Michael, Dätwyler Katrin, Meier Raphael, Radojewski Piotr, Murugesan Gowtham Krishnan, Nalawade Sahil, Ganesh Chandan, Wagner Ben, Yu Fang F, Fei Baowei, Madhuranthakam Ananth J, Maldjian Joseph A, Daza Laura, Gómez Catalina, Arbeláez Pablo, Dai Chengliang, Wang Shuo, Reynaud Hadrien, Mo Yuanhan, Angelini Elsa, Guo Yike, Bai Wenjia, Banerjee Subhashis, Pei Linmin, Ak Murat, Rosas-González Sarahi, Zemmoura Ilyess, Tauber Clovis, Vu Minh H, Nyholm Tufve, Löfstedt Tommy, Ballestar Laura Mora, Vilaplana Veronica, McHugh Hugh, Maso Talou Gonzalo, Wang Alan, Patel Jay, Chang Ken, Hoebel Katharina, Gidwani Mishka, Arun Nishanth, Gupta Sharut, Aggarwal Mehak, Singh Praveer, Gerstner Elizabeth R, Kalpathy-Cramer Jayashree, Boutry Nicolas, Huard Alexis, Vidyaratne Lasitha, Rahman Md Monibor, Iftekharuddin Khan M, Chazalon Joseph, Puybareau Elodie, Tochon Guillaume, Ma Jun, Cabezas Mariano, Llado Xavier, Oliver Arnau, Valencia Liliana, Valverde Sergi, Amian Mehdi, Soltaninejad Mohammadreza, Myronenko Andriy, Hatamizadeh Ali, Feng Xue, Dou Quan, Tustison Nicholas, Meyer Craig, Shah Nisarg A, Talbar Sanjay, Weber Marc-André, Mahajan Abhishek, Jakab Andras, Wiest Roland, Fathallah-Shaykh Hassan M, Nazeri Arash, Milchenko Mikhail, Marcus Daniel, Kotrotsou Aikaterini, Colen Rivka, Freymann John, Kirby Justin, Davatzikos Christos, Menze Bjoern, Bakas Spyridon, Gal Yarin, Arbel Tal
Centre for Intelligent Machines (CIM), McGill University, Montreal, QC, Canada.
Oxford Applied and Theoretical Machine Learning (OATML) Group, University of Oxford, Oxford, England.
J Mach Learn Biomed Imaging. 2022 Aug;2022.
Deep learning (DL) models have provided state-of-the-art performance in various medical imaging benchmarking challenges, including the Brain Tumor Segmentation (BraTS) challenges. However, the task of focal pathology multi-compartment segmentation (e.g., tumor and lesion sub-regions) is particularly challenging, and potential errors hinder translating DL models into clinical workflows. Quantifying the reliability of DL model predictions in the form of uncertainties could enable clinical review of the most uncertain regions, thereby building trust and paving the way toward clinical translation. Several uncertainty estimation methods have recently been introduced for DL medical image segmentation tasks. Developing scores to evaluate and compare the performance of uncertainty measures will assist the end-user in making more informed decisions. In this study, we explore and evaluate a score developed during the BraTS 2019 and BraTS 2020 task on uncertainty quantification (QU-BraTS) and designed to assess and rank uncertainty estimates for brain tumor multi-compartment segmentation. This score (1) rewards uncertainty estimates that produce high confidence in correct assertions and those that assign low confidence levels at incorrect assertions, and (2) penalizes uncertainty measures that lead to a higher percentage of under-confident correct assertions. We further benchmark the segmentation uncertainties generated by 14 independent participating teams of QU-BraTS 2020, all of which also participated in the main BraTS segmentation task. Overall, our findings confirm the importance and complementary value that uncertainty estimates provide to segmentation algorithms, highlighting the need for uncertainty quantification in medical image analyses. Finally, in favor of transparency and reproducibility, our evaluation code is made publicly available at https://github.com/RagMeh11/QU-BraTS.
深度学习(DL)模型在各种医学影像基准测试挑战中展现出了最先进的性能,包括脑肿瘤分割(BraTS)挑战。然而,局灶性病理学多区域分割任务(例如肿瘤和病变子区域)极具挑战性,潜在的错误阻碍了将DL模型转化为临床工作流程。以不确定性的形式量化DL模型预测的可靠性,可以使临床医生对最不确定的区域进行审查,从而建立信任并为临床转化铺平道路。最近,针对DL医学图像分割任务引入了几种不确定性估计方法。开发用于评估和比较不确定性度量性能的分数,将有助于终端用户做出更明智的决策。在本研究中,我们探索并评估了在BraTS 2019和BraTS 2020任务中开发的用于不确定性量化的分数(QU-BraTS),该分数旨在评估脑肿瘤多区域分割的不确定性估计并进行排名。这个分数(1)奖励那些在正确断言上产生高置信度且在错误断言上赋予低置信度的不确定性估计,(2)惩罚那些导致低置信度正确断言比例较高的不确定性度量。我们进一步对QU-BraTS 2020的14个独立参与团队生成的分割不确定性进行了基准测试,所有这些团队也都参与了主要的BraTS分割任务。总体而言,我们的研究结果证实了不确定性估计为分割算法提供的重要性和补充价值,凸显了医学图像分析中不确定性量化的必要性。最后,为了保证透明度和可重复性,我们的评估代码已在https://github.com/RagMeh11/QU-BraTS上公开提供。