在真实世界头颈癌数据集中，对用于自动分割危及器官的RayStation深度学习模型进行几何和剂量学评估。

Geometric and Dosimetric Evaluation of a RayStation Deep Learning Model for Auto-Segmentation of Organs at Risk in a Real-World Head and Neck Cancer Dataset.

作者信息

Sharma D, Singh G, Burela N, Gayen S, Aishwarya G, Nangia S

机构信息

Department of Medical Physics, Apollo Proton Cancer Center, Chennai, Tamil Nadu, India.

出版信息

Clin Oncol (R Coll Radiol). 2025 May;41:103796. doi: 10.1016/j.clon.2025.103796. Epub 2025 Mar 1.

DOI:10.1016/j.clon.2025.103796

PMID:40120536

Abstract

AIMS

To assess geometric accuracy and dosimetric impact of a deep learning segmentation (DLS) model on a large, diverse dataset of head and neck cancer (HNC) patients treated with intensity-modulated proton therapy (IMPT).

MATERIALS AND METHODS

A 3D U-Net-based DLS model was applied to CT datasets of 124 HNC patients treated with IMPT at 50.4-70.0 GyRBE. Thirty organs-at-risk (OARs), delineated manually (GT-OARs) were analysed for similarity metrics with auto-segmented OARs, without (DLS-nonedited) and with (DLS-edited) manual correction, using volume, Dice similarity coefficient (DSC), and Hausdorff distance (HD). Dosimetric impact of auto-segmentation error was assessed as absolute dose difference of mean (ΔDmean) and maximum (ΔDmax).

RESULTS

The cohort includes patients with postoperative (47.6%), flap reconstruction (12.1%), mouth bites (79.8%), dental implants (54.8%), and surgical implants (3.2%). DLS failed in 11 patients with significant anatomical challenges and artifact. Compared with GT-OARs, DLS-nonedited under-segmented 11/12 Gr-A (central nervous system, arteries, bone) (p < 0.05) and over-segmented 13/18 Gr-B (glandular, digestive, airways) OARs. DSC scores were good (>0.8), intermediate (0.6-0.8), intermediate-poor (0.5-0.6), and poor (<0.5) in 12, 6, 4, and 8 OARs. HD were good (<4mm), intermediate (4-6mm), poor (6-8mm), and very poor (>8mm) in 5, 7, 4, and 14 OARs. Compared with manually corrected, DLS-edited OARs, all DLS-nonedited OARs demonstrated excellent similarity with DSC>0.8 and HD<4mm. On average, auto-segmentation took 2.51 minutes, while correction took 6.24 minutes. The mean values of ΔDmean and ΔDmax were within ±300 and ±3 cGyRBE, except for oesophagus and larynx, where the mean ΔDmean increases up to 837.14 cGyRBE.

CONCLUSION

Patient posture, nonbiological materials, and anatomical deformities influence DLS accuracy. The model's overall performance is adequate and efficient with skilled manual editing needed for few OARs.

摘要

目的

评估深度学习分割（DLS）模型对接受调强质子治疗（IMPT）的头颈癌（HNC）患者的大型多样数据集的几何准确性和剂量学影响。

材料与方法

将基于3D U-Net的DLS模型应用于124例接受IMPT治疗、剂量为50.4 - 70.0 GyRBE的HNC患者的CT数据集。分析30个手动勾勒的危及器官（GT-OARs）与自动分割的OARs之间的相似性指标，包括未进行（DLS-未编辑）和进行（DLS-编辑）手动校正的情况，使用体积、骰子相似系数（DSC）和豪斯多夫距离（HD）。自动分割误差的剂量学影响评估为平均绝对剂量差（ΔDmean）和最大绝对剂量差（ΔDmax）。

结果

该队列包括术后患者（47.6%）、皮瓣重建患者（12.1%）、口咬患者（79.8%）、牙种植患者（54.8%）和手术植入患者（3.2%）。DLS在11例存在显著解剖挑战和伪影的患者中失败。与GT-OARs相比，DLS-未编辑的情况下，11/12 Gr-A（中枢神经系统、动脉、骨骼）分割不足（p < 0.05），13/18 Gr-B（腺体、消化系统、气道）OARs分割过度。12个、6个、4个和8个OARs的DSC评分分别为良好（>0.8）、中等（0.6 - 0.8）、中等偏差（0.5 - 0.6）和较差（<0.5）。5个、7个、4个和14个OARs的HD分别为良好（<4mm）、中等（4 - 6mm）、较差（6 - 8mm）和非常差（>8mm）。与手动校正的DLS-编辑OARs相比，所有DLS-未编辑的OARs均表现出极佳的相似性，DSC>0.8且HD<4mm。平均而言，自动分割耗时2.51分钟，而校正耗时6.24分钟。除食管和喉外，ΔDmean和ΔDmax的平均值在±300 cGyRBE和±3 cGyRBE范围内，食管和喉的平均ΔDmean增加至837.14 cGyRBE。