Chen Xinru, Zhao Yao, Baroudi Hana, El Basha Mohammad D, Daniel Aji, Gay Skylar S, Yu Cenji, Wang He, Phan Jack, Choi Seungtaek L, Goodman Chelain R, Zhang Xiaodong, Niedzielski Joshua S, Shete Sanjay S, Court Laurence E, Liao Zhongxing, Löfman Fredrik, Balter Peter A, Yang Jinzhong
Department of Radiation Physics, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX 77030, USA.
Diagnostics (Basel). 2024 Dec 18;14(24):2851. doi: 10.3390/diagnostics14242851.
BACKGROUND/OBJECTIVES: We assessed the influence of local patients and clinical characteristics on the performance of commercial deep learning (DL) segmentation models for head-and-neck (HN), breast, and prostate cancers.
Clinical computed tomography (CT) scans and clinically approved contours of 210 patients (53 HN, 49 left breast, 55 right breast, and 53 prostate cancer) were used to train and validate segmentation models integrated within a vendor-supplied DL training toolkit and to assess the performance of both vendor-pretrained and custom-trained models. Four custom models (HN, left breast, right breast, and prostate) were trained and validated with 30 (training)/5 (validation) HN, 34/5 left breast, 39/5 right breast, and 30/5 prostate patients to auto-segment a total of 24 organs at risk (OARs). Subsequently, both vendor-pretrained and custom-trained models were tested on the remaining patients from each group. Auto-segmented contours were evaluated by comparing them with clinically approved contours via the Dice similarity coefficient (DSC) and mean surface distance (MSD). The performance of the left and right breast models was assessed jointly according to ipsilateral/contralateral locations.
The average DSCs for all structures in vendor-pretrained and custom-trained models were as follows: 0.81 ± 0.12 and 0.86 ± 0.11 in HN; 0.67 ± 0.16 and 0.80 ± 0.11 in the breast; and 0.87 ± 0.09 and 0.92 ± 0.06 in the prostate. The corresponding average MSDs were 0.81 ± 0.76 mm and 0.76 ± 0.56 mm (HN), 4.85 ± 2.44 mm and 2.42 ± 1.49 mm (breast), and 2.17 ± 1.39 mm and 1.21 ± 1.00 mm (prostate). Notably, custom-trained models showed significant improvements over vendor-pretrained models for 14 of 24 OARs, reflecting the influence of data/contouring variations in segmentation performance.
These findings underscore the substantial impact of institutional preferences and clinical practices on the implementation of vendor-pretrained models. We also found that a relatively small amount of institutional data was sufficient to train customized segmentation models with sufficient accuracy.
背景/目的:我们评估了局部患者和临床特征对头颈部(HN)、乳腺和前列腺癌的商用深度学习(DL)分割模型性能的影响。
使用210例患者(53例HN、49例左侧乳腺、55例右侧乳腺和53例前列腺癌)的临床计算机断层扫描(CT)图像和临床认可的轮廓,来训练和验证集成在供应商提供的DL训练工具包中的分割模型,并评估供应商预训练模型和定制训练模型的性能。使用30例(训练)/5例(验证)HN、34例/5例左侧乳腺、39例/5例右侧乳腺和30例/5例前列腺患者训练并验证了四个定制模型(HN、左侧乳腺、右侧乳腺和前列腺),以自动分割总共24个危及器官(OAR)。随后,在每组剩余患者上测试供应商预训练模型和定制训练模型。通过Dice相似系数(DSC)和平均表面距离(MSD)将自动分割的轮廓与临床认可的轮廓进行比较,以评估其性能。根据同侧/对侧位置联合评估左侧和右侧乳腺模型的性能。
供应商预训练模型和定制训练模型中所有结构的平均DSC如下:HN中分别为0.81±0.12和0.86±0.11;乳腺中分别为0.67±0.16和0.80±0.11;前列腺中分别为0.87±0.09和0.92±0.06。相应的平均MSD分别为0.81±0.76mm和0.76±0.56mm(HN)、4.85±2.44mm和2.42±1.49mm(乳腺)、2.17±1.39mm和1.21±1.00mm(前列腺)。值得注意的是,对于24个OAR中的14个,定制训练模型相对于供应商预训练模型有显著改进,这反映了数据/轮廓变化对分割性能的影响。
这些发现强调了机构偏好和临床实践对供应商预训练模型实施的重大影响。我们还发现,相对少量的机构数据足以训练出具有足够准确性的定制分割模型。