Lenfant Louis, Beitone Clément, Troccaz Jocelyne, Fiard Gaelle, Malavaud Bernard, Voros Sandrine, Mozer Pierre C
Predictive Onco-Urology, GRC n°5, Sorbonne Université, AP-HP, Hôpital Pitié-Salpêtrière, Urology, Paris, France.
Université Grenoble Alpes, CNRS, INSERM, Grenoble INP, TIMC, Grenoble, France.
Med Phys. 2025 Aug;52(8):e18025. doi: 10.1002/mp.18025.
Accurate prostate segmentation in transrectal ultrasound (TRUS) imaging is essential for diagnosis, treatment planning, and developing artificial intelligence (AI) algorithms. Although manual segmentation is often recommended as the ground truth for AI training, it is time-consuming, prone to inter- and intra-observer variability, and rarely used in everyday clinical practice. Semi-automatic methods provide a faster alternative but lack thorough multi-operator evaluations. Understanding variability in segmentation methods is crucial to defining a reliable reference standard for future AI training.
To investigate the inter-individual variability in manual and semi-automatic prostate contour segmentation on 3D TRUS images and to compare both approaches to determine the most consistent method that could serve as a reference standard for future AI model development.
This study is a methodological investigation and not an AI study. Four urology experts independently performed manual and semi-automatic segmentation on 100 prostate 3D TRUS exams obtained from patients undergoing fusion prostate biopsy. Inter-individual and intra-individual variability for manual segmentation was assessed using the Average Surface Distance (ASD) between manually placed points and a reference mesh. Two methods were used to create the reference prostate mesh after manual point positioning: a statistical shape model (manual_SSM) and a deformable model (manual_soft-SSM). Semi-automatic segmentations were evaluated using ASD, Dice similarity coefficient, and Hausdorff distance. A Simultaneous Truth and Performance Level Estimation (STAPLE) like consensus method was applied to assess variability across experts in semi-automatic segmentation. Statistical comparisons used Wilcoxon tests, and effect sizes were calculated using Cohen's d. Bonferroni correction was applied for multiple comparisons. A significance level of p < 0.05 (adjusted as needed) was used.
Manual segmentation inter-individual variability was higher with the manual_SSM method [ASD = 2.6 mm (Inter Quartile Range (IQR) 2.3-3.0)] compared to the manual_soft-SSM [ASD = 1.5 mm (IQR 1.2-1.8), P < 0.001]. Intra-individual variability also showed lower ASD values with manual_soft-SSM compared to manual_SSM, [(1.0 (0.8-1.1) versus 2.2 (1.9-2.6), p < 0.001], respectively. For semi-automatic segmentation, inter-individual variability yielded an ASD of 1.4 mm (IQR 1.1-1.9), Dice of 0.90 (IQR 0.88-0.92), and Hausdorff distance of 5.7 mm (IQR 4.47-7.36). Manual and semi-automatic segmentation comparisons demonstrated an ASD of 1.43 mm (IQR 1.20-1.90).
The semi-automatic segmentation method evaluated in this study demonstrated comparable accuracy to manual segmentation while reducing inter- and intra-individual variability. These findings suggest that the tested semi-automatic approach can serve as a reliable reference standard for AI training in prostate segmentation.
经直肠超声(TRUS)成像中的前列腺精确分割对于诊断、治疗规划以及人工智能(AI)算法的开发至关重要。尽管手动分割通常被推荐作为AI训练的金标准,但它耗时且易受观察者间和观察者内变异性的影响,在日常临床实践中很少使用。半自动方法提供了一种更快的替代方案,但缺乏全面的多操作者评估。了解分割方法的变异性对于为未来的AI训练定义可靠的参考标准至关重要。
研究在三维TRUS图像上手动和半自动前列腺轮廓分割的个体间变异性,并比较这两种方法,以确定最一致的方法,作为未来AI模型开发的参考标准。
本研究是一项方法学研究,而非AI研究。四位泌尿外科专家对从接受融合前列腺活检的患者中获取的100例前列腺三维TRUS检查独立进行手动和半自动分割。使用手动放置点与参考网格之间的平均表面距离(ASD)评估手动分割的个体间和个体内变异性。在手动点定位后,使用两种方法创建参考前列腺网格:统计形状模型(manual_SSM)和可变形模型(manual_soft-SSM)。使用ASD、Dice相似系数和豪斯多夫距离评估半自动分割。应用一种类似同时真相与性能水平估计(STAPLE)的一致性方法来评估专家间在半自动分割中的变异性。统计比较使用Wilcoxon检验,并使用Cohen's d计算效应大小。对多重比较应用Bonferroni校正。使用p < 0.05(必要时进行调整)的显著性水平。
与manual_soft-SSM [ASD = 1.5 mm(四分位间距(IQR)1.2 - 1.8)]相比,manual_SSM方法的手动分割个体间变异性更高[ASD = 2.6 mm(IQR 2.3 - 3.0),P < 0.001]。与manual_SSM相比,manual_soft-SSM的个体内变异性也显示出更低的ASD值,分别为[(1.0(0.8 - 1.1)对2.2(1.9 - 2.6),p < 0.001]。对于半自动分割,个体间变异性产生的ASD为1.4 mm(IQR 1.1 - 1.9),Dice为0.90(IQR 0.88 - 0.92),豪斯多夫距离为5.7 mm(IQR 4.47 - 7.36)。手动和半自动分割比较显示ASD为1.43 mm(IQR 1.20 - 1.90)。
本研究中评估的半自动分割方法在降低个体间和个体内变异性的同时,显示出与手动分割相当的准确性。这些发现表明,所测试的半自动方法可作为前列腺分割AI训练的可靠参考标准。