Department of Radiation Oncology, The First Affiliated Hospital of Soochow University, Suzhou, China.
Radiat Oncol. 2024 May 12;19(1):55. doi: 10.1186/s13014-024-02448-z.
Currently, automatic esophagus segmentation remains a challenging task due to its small size, low contrast, and large shape variation. We aimed to improve the performance of esophagus segmentation in deep learning by applying a strategy that involves locating the object first and then performing the segmentation task.
A total of 100 cases with thoracic computed tomography scans from two publicly available datasets were used in this study. A modified CenterNet, an object location network, was employed to locate the center of the esophagus for each slice. Subsequently, the 3D U-net and 2D U-net_coarse models were trained to segment the esophagus based on the predicted object center. A 2D U-net_fine model was trained based on the updated object center according to the 3D U-net model. The dice similarity coefficient and the 95% Hausdorff distance were used as quantitative evaluation indexes for the delineation performance. The characteristics of the automatically delineated esophageal contours by the 2D U-net and 3D U-net models were summarized. Additionally, the impact of the accuracy of object localization on the delineation performance was analyzed. Finally, the delineation performance in different segments of the esophagus was also summarized.
The mean dice coefficient of the 3D U-net, 2D U-net_coarse, and 2D U-net_fine models were 0.77, 0.81, and 0.82, respectively. The 95% Hausdorff distance for the above models was 6.55, 3.57, and 3.76, respectively. Compared with the 2D U-net, the 3D U-net has a lower incidence of delineating wrong objects and a higher incidence of missing objects. After using the fine object center, the average dice coefficient was improved by 5.5% in the cases with a dice coefficient less than 0.75, while that value was only 0.3% in the cases with a dice coefficient greater than 0.75. The dice coefficients were lower for the esophagus between the orifice of the inferior and the pulmonary bifurcation compared with the other regions.
The 3D U-net model tended to delineate fewer incorrect objects but also miss more objects. Two-stage strategy with accurate object location could enhance the robustness of the segmentation model and significantly improve the esophageal delineation performance, especially for cases with poor delineation results.
目前,由于食管体积小、对比度低、形状变化大,自动分割仍然是一项具有挑战性的任务。我们旨在通过应用一种首先定位对象然后执行分割任务的策略来提高深度学习中食管分割的性能。
本研究共使用了来自两个公开数据集的 100 例胸部计算机断层扫描。使用了一种改进的 CenterNet,即目标定位网络,用于定位每个切片的食管中心。随后,根据预测的目标中心,训练 3D U-net 和 2D U-net_coarse 模型来分割食管。根据 3D U-net 模型,基于更新后的目标中心,训练 2D U-net_fine 模型。使用 Dice 相似系数和 95%Hausdorff 距离作为描绘性能的定量评价指标。总结了 2D U-net 和 3D U-net 模型自动描绘的食管轮廓的特征。此外,还分析了目标定位精度对描绘性能的影响。最后,还总结了食管不同节段的描绘性能。
3D U-net、2D U-net_coarse 和 2D U-net_fine 模型的平均 Dice 系数分别为 0.77、0.81 和 0.82,95%Hausdorff 距离分别为 6.55、3.57 和 3.76。与 2D U-net 相比,3D U-net 勾画错误对象的发生率较低,勾画缺失对象的发生率较高。使用精细目标中心后,Dice 系数小于 0.75 的病例平均 Dice 系数提高了 5.5%,而 Dice 系数大于 0.75 的病例仅提高了 0.3%。与其他区域相比,下口与肺分叉之间的食管的 Dice 系数较低。
3D U-net 模型倾向于勾画更少的错误对象,但也会遗漏更多的对象。具有准确目标定位的两阶段策略可以增强分割模型的鲁棒性,显著提高食管描绘性能,特别是对于描绘结果较差的病例。