Alzahrani Nouf M, Henry Ann M, Al-Qaisieh Bashar M, Murray Louise J, Nix Michael G
Department of Diagnostic Radiology, King Abdulaziz University, Jeddah, Saudi Arabia.
School of Medicine, University of Leeds, Leeds, UK.
J Appl Clin Med Phys. 2024 Dec;25(12):e14513. doi: 10.1002/acm2.14513. Epub 2024 Sep 16.
We have built a novel AI-driven QA method called AutoConfidence (ACo), to estimate segmentation confidence on a per-voxel basis without gold standard segmentations, enabling robust, efficient review of automated segmentation (AS). We have demonstrated this method in brain OAR AS on MRI, using internal and external (third-party) AS models.
Thirty-two retrospectives, MRI planned, glioma cases were randomly selected from a local clinical cohort for ACo training. A generator was trained adversarialy to produce internal autosegmentations (IAS) with a discriminator to estimate voxel-wise IAS uncertainty, given the input MRI. Confidence maps for each proposed segmentation were produced for operator use in AS editing and were compared with "difference to gold-standard" error maps. Nine cases were used for testing ACo performance on IAS and validation with two external deep learning segmentation model predictions [external model with low-quality AS (EM-LQ) and external model with high-quality AS (EM-HQ)]. Matthew's correlation coefficient (MCC), false-positive rate (FPR), false-negative rate (FNR), and visual assessment were used for evaluation. Edge removal and geometric distance corrections were applied to achieve more useful and clinically relevant confidence maps and performance metrics.
ACo showed generally excellent performance on both internal and external segmentations, across all OARs (except lenses). MCC was higher on IAS and low-quality external segmentations (EM-LQ) than high-quality ones (EM-HQ). On IAS and EM-LQ, average MCC (excluding lenses) varied from 0.6 to 0.9, while average FPR and FNR were ≤0.13 and ≤0.21, respectively. For EM-HQ, average MCC varied from 0.4 to 0.8, while average FPR and FNR were ≤0.37 and ≤0.22, respectively.
ACo was a reliable predictor of uncertainty and errors on AS generated both internally and externally, demonstrating its potential as an independent, reference-free QA tool, which could help operators deliver robust, efficient autosegmentation in the radiotherapy clinic.
我们构建了一种名为自动置信度(ACo)的新型人工智能驱动的质量保证方法,无需金标准分割即可在体素基础上估计分割置信度,从而实现对自动分割(AS)的稳健、高效审查。我们已在MRI脑OAR的AS中使用内部和外部(第三方)AS模型对该方法进行了验证。
从本地临床队列中随机选择32例回顾性MRI计划的胶质瘤病例用于ACo训练。训练一个生成器以对抗方式生成内部自动分割(IAS),并使用一个判别器在给定输入MRI的情况下估计体素级IAS不确定性。为每个提议的分割生成置信度图,供操作员在AS编辑中使用,并与“与金标准的差异”误差图进行比较。使用9个病例测试ACo在IAS上的性能,并使用两个外部深度学习分割模型预测结果[低质量AS外部模型(EM-LQ)和高质量AS外部模型(EM-HQ)]进行验证。使用马修斯相关系数(MCC)、假阳性率(FPR)、假阴性率(FNR)和视觉评估进行评价。应用边缘去除和几何距离校正以获得更有用且与临床相关的置信度图和性能指标。
ACo在所有OAR(晶状体除外)的内部和外部分割上总体表现出色。IAS和低质量外部分割(EM-LQ)的MCC高于高质量分割(EM-HQ)。在IAS和EM-LQ上,平均MCC(不包括晶状体)在0.6至0.9之间变化,而平均FPR和FNR分别≤0.13和≤0.21。对于EM-HQ,平均MCC在0.4至0.8之间变化,而平均FPR和FNR分别≤0.37和≤0.22。
ACo是内部和外部生成的AS不确定性和误差的可靠预测指标,证明了其作为独立的、无需参考的质量保证工具的潜力,这有助于操作员在放射治疗临床中实现稳健、高效的自动分割。