Tozuka Ryota, Kadoya Noriyuki, Yasunaga Arata, Saito Masahide, Komiyama Takafumi, Nemoto Hikaru, Ando Hidetoshi, Onishi Hiroshi, Jingu Keiichi
Department of Radiation Oncology, Tohoku University Graduate School of Medicine, Sendai, Japan.
Department of Radiology, University of Yamanashi, Yamanashi, Japan.
Jpn J Radiol. 2025 Sep 15. doi: 10.1007/s11604-025-01865-8.
To develop and evaluate a novel deep learning strategy for automated early-stage lung cancer gross tumor volume (GTV) segmentation, utilizing pre-training with mathematically generated non-natural fractal images.
This retrospective study included 104 patients (36-91 years old; 81 males; 23 females) with peripheral early-stage non-small cell lung cancer who underwent radiotherapy at our institution from December 2017 to March 2025. First, we utilized encoders from a Convolutional Neural Network and a Vision Transformer (ViT), pre-trained with four learning strategies: from scratch, ImageNet-1K (1,000 classes of natural images), FractalDB-1K (1,000 classes of fractal images), and FractalDB-10K (10,000 classes of fractal images), with the latter three utilizing publicly available models. Second, the models were fine-tuned using CT images and physician-created contour data. Model accuracy was then evaluated using the volumetric Dice Similarity Coefficient (vDSC), surface Dice Similarity Coefficient (sDSC), and 95th percentile Hausdorff Distance (HD95) between the predicted and ground truth GTV contours, averaged across the fourfold cross-validation. Additionally, the segmentation accuracy was compared between simple and complex groups, categorized by the surface-to-volume ratio, to assess the impact of GTV shape complexity.
Pre-trained with FractalDB-10K yielded the best segmentation accuracy across all metrics. For the ViT model, the vDSC, sDSC, and HD95 results were 0.800 ± 0.079, 0.732 ± 0.152, and 2.04 ± 1.59 mm for FractalDB-10K; 0.779 ± 0.093, 0.688 ± 0.156, and 2.72 ± 3.12 mm for FractalDB-1K; 0.764 ± 0.102, 0.660 ± 0.156, and 3.03 ± 3.47 mm for ImageNet-1K, respectively. In conditions FractalDB-1K and ImageNet-1K, there was no significant difference in the simple group, whereas the complex group showed a significantly higher vDSC (0.743 ± 0.095 vs 0.714 ± 0.104, p = 0.006).
Pre-training with fractal structures achieved comparable or superior accuracy to ImageNet pre-training for early-stage lung cancer GTV auto-segmentation.
开发并评估一种新型深度学习策略,用于自动分割早期肺癌的大体肿瘤体积(GTV),该策略利用数学生成的非自然分形图像进行预训练。
这项回顾性研究纳入了2017年12月至2025年3月在本机构接受放疗的104例外周早期非小细胞肺癌患者(年龄36 - 91岁;男性81例;女性23例)。首先,我们使用来自卷积神经网络和视觉Transformer(ViT)的编码器,采用四种学习策略进行预训练:从零开始、ImageNet - 1K(1000类自然图像)、FractalDB - 1K(1000类分形图像)和FractalDB - 10K(10000类分形图像),后三种使用公开可用模型。其次,使用CT图像和医生创建的轮廓数据对模型进行微调。然后使用预测的和真实的GTV轮廓之间的体积骰子相似系数(vDSC)、表面骰子相似系数(sDSC)和第95百分位数豪斯多夫距离(HD95)评估模型准确性,这些指标在四重交叉验证中进行平均。此外,根据表面积与体积比将患者分为简单组和复杂组,比较两组的分割准确性,以评估GTV形状复杂性的影响。
使用FractalDB - 10K进行预训练在所有指标上产生了最佳分割准确性。对于ViT模型,FractalDB - 10K的vDSC、sDSC和HD95结果分别为0.800±0.079、0.732±0.152和2.04±1.59毫米;FractalDB - 1K的结果分别为0.779±0.093、0.688±0.156和2.72±3.12毫米;ImageNet - 1K的结果分别为0.764±0.102、0.660±0.156和3.03±3.47毫米。在FractalDB - 1K和ImageNet - 1K条件下,简单组无显著差异,而复杂组的vDSC显著更高(0.743±0.095对0.714±0.104,p = 0.006)。
对于早期肺癌GTV自动分割,使用分形结构进行预训练可达到与ImageNet预训练相当或更高的准确性。