Isaksson Lars Johannes, Summers Paul, Bhalerao Abhir, Gandini Sara, Raimondi Sara, Pepa Matteo, Zaffaroni Mattia, Corrao Giulia, Mazzola Giovanni Carlo, Rotondi Marco, Lo Presti Giuliana, Haron Zaharudin, Alessi Sara, Pricolo Paola, Mistretta Francesco Alessandro, Luzzago Stefano, Cattani Federica, Musi Gennaro, De Cobelli Ottavio, Cremonesi Marta, Orecchia Roberto, Marvaso Giulia, Petralia Giuseppe, Jereczek-Fossa Barbara Alicja
Division of Radiation Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy.
Division of Radiology, IEO European Institute of Oncology IRCCS, Milan, Italy.
Insights Imaging. 2022 Aug 17;13(1):137. doi: 10.1186/s13244-022-01276-7.
Deploying an automatic segmentation model in practice should require rigorous quality assurance (QA) and continuous monitoring of the model's use and performance, particularly in high-stakes scenarios such as healthcare. Currently, however, tools to assist with QA for such models are not available to AI researchers. In this work, we build a deep learning model that estimates the quality of automatically generated contours.
The model was trained to predict the segmentation quality by outputting an estimate of the Dice similarity coefficient given an image contour pair as input. Our dataset contained 60 axial T2-weighted MRI images of prostates with ground truth segmentations along with 80 automatically generated segmentation masks. The model we used was a 3D version of the EfficientDet architecture with a custom regression head. For validation, we used a fivefold cross-validation. To counteract the limitation of the small dataset, we used an extensive data augmentation scheme capable of producing virtually infinite training samples from a single ground truth label mask. In addition, we compared the results against a baseline model that only uses clinical variables for its predictions.
Our model achieved a mean absolute error of 0.020 ± 0.026 (2.2% mean percentage error) in estimating the Dice score, with a rank correlation of 0.42. Furthermore, the model managed to correctly identify incorrect segmentations (defined in terms of acceptable/unacceptable) 99.6% of the time.
We believe that the trained model can be used alongside automatic segmentation tools to ensure quality and thus allow intervention to prevent undesired segmentation behavior.
在实际应用中部署自动分割模型需要严格的质量保证(QA)以及对模型的使用和性能进行持续监测,尤其是在医疗保健等高风险场景中。然而,目前人工智能研究人员无法获得协助此类模型进行质量保证的工具。在这项工作中,我们构建了一个深度学习模型,用于估计自动生成轮廓的质量。
该模型经过训练,通过将图像轮廓对作为输入输出Dice相似系数的估计值来预测分割质量。我们的数据集包含60张前列腺轴向T2加权MRI图像以及真实分割结果,还有80个自动生成的分割掩码。我们使用的模型是具有自定义回归头的EfficientDet架构的3D版本。为了进行验证,我们采用了五折交叉验证。为了克服小数据集的局限性,我们使用了一种广泛的数据增强方案,能够从单个真实标签掩码生成几乎无限数量的训练样本。此外,我们将结果与仅使用临床变量进行预测的基线模型进行了比较。
我们的模型在估计Dice分数时的平均绝对误差为0.020±0.026(平均百分比误差为2.2%),等级相关性为0.42。此外,该模型在99.6%的时间内能够正确识别不正确的分割(根据可接受/不可接受定义)。
我们相信,经过训练的模型可以与自动分割工具一起使用,以确保质量,从而允许进行干预以防止出现不期望的分割行为。