Yu Songli, Li Yunxiang, Jiao Pengfei, Liu Yixiu, Zhao Jianxiang, Yan Chenggang, Wang Qifeng, Wang Shuai
School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China.
Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, Texas, USA.
Med Phys. 2025 Jul;52(7):e17818. doi: 10.1002/mp.17818. Epub 2025 Apr 14.
Accurate and reliable segmentation of esophageal gross tumor volume (GTV) in computed tomography (CT) is beneficial for diagnosing and treating. However, this remains a challenging task because the esophagus has a variable shape and extensive vertical range, resulting in tumors potentially appearing at any position within it.
This study introduces a novel CNN-transformer-based U-shape model (LRRM-U-TransNet) designed to enhance the segmentation accuracy of esophageal GTV. By leveraging advanced deep learning techniques, we aim to address the challenges posed by the variable shape and extensive range of the esophagus, ultimately improving diagnostic and treatment outcomes.
Specifically, we propose a long-range relay mechanism to converge all layer feature information by progressively passing adjacent layer feature maps in the pixel and semantic pathways. Moreover, we propose two ready-to-use blocks to implement this mechanism concretely. The Dual FastViT block interacts with feature maps from two paths to enhance feature representation capabilities. The Dual AxialViT block acts as a secondary auxiliary bottleneck to acquire global information for more precise feature map reconstruction.
We build a new esophageal tumor dataset with 1665 real-world patient CT samples annotated by five expert radiologists and employ multiple evaluation metrics to validate our model. Results of a five-fold cross-validation on this dataset show that LRRM-U-TransNet achieves a Dice coefficient of 0.834, a Jaccard coefficient of 0.730, a Precision of 0.840, a HD95 of 3.234 mm, and a Volume Similarity of 0.143.
We propose a CNN-Transformer hybrid deep learning network to improve the segmentation effect of esophageal tumors. We utilize the local and global information between shallower and deeper layers to prevent early information loss and enhance the cross-layer communication. To validate our model, we collect a dataset composed of 1665 CT images of esophageal tumors from Sichuan Tumor Hospital. The results show that our model outperforms the state-of-the-art models. It is of great significance to improve the accuracy and clinical application of esophageal tumor segmentation.
在计算机断层扫描(CT)中准确可靠地分割食管大体肿瘤体积(GTV)有助于诊断和治疗。然而,这仍然是一项具有挑战性的任务,因为食管形状多变且垂直范围广,导致肿瘤可能出现在其内部的任何位置。
本研究介绍了一种基于卷积神经网络-Transformer的新型U形模型(LRRM-U-TransNet),旨在提高食管GTV的分割精度。通过利用先进的深度学习技术,我们旨在应对食管形状多变和范围广所带来的挑战,最终改善诊断和治疗结果。
具体而言,我们提出了一种远程中继机制,通过在像素和语义路径中逐步传递相邻层特征图来汇聚所有层的特征信息。此外,我们提出了两个现成的模块来具体实现这一机制。双路快速视觉Transformer(Dual FastViT)模块与来自两条路径的特征图相互作用,以增强特征表示能力。双路轴向视觉Transformer(Dual AxialViT)模块作为二级辅助瓶颈,获取全局信息以进行更精确的特征图重建。
我们构建了一个新的食管肿瘤数据集,其中包含1665个由五位放射科专家标注的真实患者CT样本,并采用多个评估指标来验证我们的模型。在该数据集上进行的五折交叉验证结果表明,LRRM-U-TransNet的Dice系数为0.834,Jaccard系数为0.730,精度为0.840,95% Hausdorff距离(HD95)为3.234毫米,体积相似度为0.143。
我们提出了一种卷积神经网络-Transformer混合深度学习网络,以提高食管肿瘤的分割效果。我们利用浅层和深层之间的局部和全局信息来防止早期信息丢失,并增强跨层通信。为了验证我们的模型,我们收集了来自四川省肿瘤医院的由1665张食管肿瘤CT图像组成的数据集。结果表明,我们的模型优于现有最先进的模型。这对于提高食管肿瘤分割的准确性和临床应用具有重要意义。