Kim Sihwan, Park Changmin, Jeon Gwanghyeon, Kim Seohee, Kim Jong Hyo
Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 08826, Republic of Korea.
ClariPi Research, ClariPi Inc., Seoul 03088, Republic of Korea.
Bioengineering (Basel). 2025 Jan 16;12(1):81. doi: 10.3390/bioengineering12010081.
Recent advancements in deep learning have significantly improved medical image segmentation. However, the generalization performance and potential risks of data-driven models remain insufficiently validated. Specifically, unrealistic segmentation predictions deviating from actual anatomical structures, known as a Seg-Hallucination, often occur in deep learning-based models. The Seg-Hallucinations can result in erroneous quantitative analyses and distort critical imaging biomarker information, yet effective audits or corrections to address these issues are rare. Therefore, we propose an automated Seg-Hallucination surveillance and correction (ASHSC) algorithm utilizing only 3D organ mask information derived from CT images without reliance on the ground truth. Two publicly available datasets were used in developing the ASHSC algorithm: 280 CT scans from the TotalSegmentator dataset for training and 274 CT scans from the Cancer Imaging Archive (TCIA) dataset for performance evaluation. The ASHSC algorithm utilizes a two-stage on-demand strategy with mesh-based convolutional neural networks and generative artificial intelligence. The segmentation quality level (SQ-level)-based surveillance stage was evaluated using the area under the receiver operating curve, sensitivity, specificity, and positive predictive value. The on-demand correction performance of the algorithm was assessed using similarity metrics: volumetric Dice score, volume error percentage, average surface distance, and Hausdorff distance. Average performance of the surveillance stage resulted in an AUROC of 0.94 ± 0.01, sensitivity of 0.82 ± 0.03, specificity of 0.90 ± 0.01, and PPV of 0.92 ± 0.01 for test dataset. After the on-demand refinement of the correction stage, all the four similarity metrics were improved compared to a single use of the AI-segmentation model. This study not only enhances the efficiency and reliability of handling the Seg-Hallucination but also eliminates the reliance on ground truth. The ASHSC algorithm offers intuitive 3D guidance for uncertainty regions, while maintaining manageable computational complexity. The SQ-level-based on-demand correction strategy adaptively minimizes uncertainties inherent in deep-learning-based organ masks and advances automated auditing and correction methodologies.
深度学习的最新进展显著改善了医学图像分割。然而,数据驱动模型的泛化性能和潜在风险仍未得到充分验证。具体而言,在基于深度学习的模型中,经常会出现偏离实际解剖结构的不切实际的分割预测,即所谓的分割幻觉。分割幻觉可能导致错误的定量分析并扭曲关键的影像生物标志物信息,但针对这些问题的有效审核或纠正措施却很少见。因此,我们提出了一种自动分割幻觉监测与校正(ASHSC)算法,该算法仅利用从CT图像中获取的3D器官掩码信息,而不依赖于真实标注。在开发ASHSC算法时使用了两个公开可用的数据集:来自TotalSegmentator数据集的280例CT扫描用于训练,来自癌症影像存档(TCIA)数据集的274例CT扫描用于性能评估。ASHSC算法采用基于网格的卷积神经网络和生成式人工智能的两阶段按需策略。基于分割质量水平(SQ-level)的监测阶段使用受试者工作特征曲线下面积、灵敏度、特异性和阳性预测值进行评估。该算法的按需校正性能使用相似性指标进行评估:体积骰子分数、体积误差百分比、平均表面距离和豪斯多夫距离。监测阶段的平均性能在测试数据集上的受试者工作特征曲线下面积为0.94±0.01,灵敏度为0.82±0.03,特异性为0.90±0.01,阳性预测值为0.92±0.01。在校正阶段进行按需优化后,与单次使用人工智能分割模型相比,所有四个相似性指标均有所改善。本研究不仅提高了处理分割幻觉的效率和可靠性,还消除了对真实标注 的依赖。ASHSC算法为不确定区域提供直观的3D指导,同时保持可管理的计算复杂度。基于SQ-level的按需校正策略自适应地最小化基于深度学习的器官掩码中固有的不确定性,并推进自动审核和校正方法。