Chatterjee Devina, Kanhere Adway, Doo Florence X, Zhao Jerry, Chan Andrew, Welsh Alexander, Kulkarni Pranav, Trang Annie, Parekh Vishwa S, Yi Paul H
Department of Diagnostic Radiology and Nuclear Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
Department of Diagnostic Imaging, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, 38105 TN, USA.
J Imaging Inform Med. 2025 Jun;38(3):1628-1641. doi: 10.1007/s10278-024-01273-w. Epub 2024 Sep 19.
Deep learning (DL) tools developed on adult data sets may not generalize well to pediatric patients, posing potential safety risks. We evaluated the performance of TotalSegmentator, a state-of-the-art adult-trained CT organ segmentation model, on a subset of organs in a pediatric CT dataset and explored optimization strategies to improve pediatric segmentation performance. TotalSegmentator was retrospectively evaluated on abdominal CT scans from an external adult dataset (n = 300) and an external pediatric data set (n = 359). Generalizability was quantified by comparing Dice scores between adult and pediatric external data sets using Mann-Whitney U tests. Two DL optimization approaches were then evaluated: (1) 3D nnU-Net model trained on only pediatric data, and (2) an adult nnU-Net model fine-tuned on the pediatric cases. Our results show TotalSegmentator had significantly lower overall mean Dice scores on pediatric vs. adult CT scans (0.73 vs. 0.81, P < .001) demonstrating limited generalizability to pediatric CT scans. Stratified by organ, there was lower mean pediatric Dice score for four organs (P < .001, all): right and left adrenal glands (right adrenal, 0.41 [0.39-0.43] vs. 0.69 [0.66-0.71]; left adrenal, 0.35 [0.32-0.37] vs. 0.68 [0.65-0.71]); duodenum (0.47 [0.45-0.49] vs. 0.67 [0.64-0.69]); and pancreas (0.73 [0.72-0.74] vs. 0.79 [0.77-0.81]). Performance on pediatric CT scans improved by developing pediatric-specific models and fine-tuning an adult-trained model on pediatric images where both methods significantly improved segmentation accuracy over TotalSegmentator for all organs, especially for smaller anatomical structures (e.g., > 0.2 higher mean Dice for adrenal glands; P < .001).
基于成人数据集开发的深度学习(DL)工具可能无法很好地推广到儿科患者,从而带来潜在的安全风险。我们评估了TotalSegmentator(一种先进的针对成人训练的CT器官分割模型)在儿科CT数据集中一部分器官上的性能,并探索了优化策略以提高儿科分割性能。对来自外部成人数据集(n = 300)和外部儿科数据集(n = 359)的腹部CT扫描进行回顾性评估TotalSegmentator。通过使用曼-惠特尼U检验比较成人和儿科外部数据集之间的Dice分数来量化可推广性。然后评估了两种深度学习优化方法:(1)仅在儿科数据上训练的3D nnU-Net模型,以及(2)在儿科病例上微调的成人nnU-Net模型。我们的结果表明,与成人CT扫描相比,TotalSegmentator在儿科CT扫描上的总体平均Dice分数显著更低(0.73对0.81,P <.001),这表明其对儿科CT扫描的可推广性有限。按器官分层,四个器官的儿科平均Dice分数较低(P <.001,均为):右肾上腺和左肾上腺(右肾上腺,0.41 [0.39 - 0.43]对0.69 [0.66 - 0.71];左肾上腺,0.35 [0.32 - 0.37]对0.68 [0.65 - 0.71]);十二指肠(0.47 [0.45 - 0.49]对0.67 [0.64 - 0.69]);以及胰腺(0.73 [0.72 - 0.74]对0.79 [0.77 - 0.81])。通过开发针对儿科的模型以及在儿科图像上微调针对成人训练的模型,儿科CT扫描的性能得到了改善,这两种方法在所有器官上的分割准确性均显著高于TotalSegmentator,尤其是对于较小的解剖结构(例如,肾上腺的平均Dice分数高> 0.2;P <.001)。