Yuen Johnson, Deshpande Shrikant, Poder Joel, Jameson Michael G, Schmidt Laurel, Trakis Stami, Shields-Dowton Kendell, Saba Anastasia, Popovic Gordana, Rahbari Reza, Holloway Lois
St George Hospital Cancer Care Centre, Kogarah, New South Wales, Australia; South Western Clinical School, University of New South Wales, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Sydney, New South Wales, Australia.
South Western Clinical School, University of New South Wales, Sydney, New South Wales, Australia; Ingham Institute for Applied Medical Research, Sydney, New South Wales, Australia; Liverpool and Macarthur Cancer Therapy Centres, Sydney, New South Wales Australia.
Med Dosim. 2025 May 22. doi: 10.1016/j.meddos.2025.04.004.
Accurate contouring is crucial for optimal treatment outcomes, whether for nonadaptive radiotherapy with single images or adaptive radiotherapy (ART) with multiple images. For ART there are 2 common approaches for automated segmentation: deformable image registration (DIR) propagation of prior contours from a previous image to a newer replanning image or (ii) deep learning (DL) generated by models trained with datasets. The accuracy of the latter approach is impacted by the size, diversity and quality of the training dataset while the accuracy of the former approach depends on the quality of prior contours, the image contrast between image pairs, and the DIR algorithm used. This study assesses the accuracy of a commercially available pretrained DL model (Mirada DLC04, DLC13, DLC14) and DIR tools (Velocity, MIM, Eclipse) for generating contours in replanning scenarios for adaptive replanning in the head and neck region. Datasets from adaptive replanning in the head and neck region (n = 9 patients) included CTs (n = 18) with clinically approved contours and doses. Manual contour data were compared against deep learning models (Mirada DLC04, DLC13, DLC14) and image registration propagated contours (rigid, deformable with MIM, Velocity, and Eclipse). Evaluation involved (a) contour clinical relevance scores, (b) contour grading scores, (c) assessment of manual and DL contouring style by a Radiation Oncologist Consultant against the Brouwer contouring guideline and (d) accuracy assessment based on geometric and dosimetric metrics, These metrics included dice similarity coefficient (DSC), mean distance to agreement (MDA), Hausdorff distance(HD), volume ratio, and dose ratio. Contours were shortlisted for statistical analysis based on (i) contour relevance (ii) manual contour grading scores and (iii) existing contouring data. Statistical analysis assessed geometric and dosimetric metrics, with the Velocity DIR as the comparator. Contour relevancy scores were highest for spinal cord, parotids, oral cavity, mandible, larynx, and brainstem. Contour grading scores indicated most contours were clinically acceptable contours with minor edits for both manual and DL contours, except for brachial plexus and oral cavity with variation in contouring style described by the Radiation Oncologist. The brainstem and parotid were shortlisted for statistical analysis, with data indicating that: (i) no statistical evidence (all p > 0.1) of dosimetric difference between DL and DIR contours; (ii) geometrically, the DIR algorithm (Velocity) was superior to the DL model (Mirada DLCExpert) in terms of MDA (p = 0.014) and HD (p < 0.001) for parotids, volume difference for brainstem (p = 0.045); (iii) no statistical evidence (all p > 0.1) of geometric or dose difference for parotids and brainstem amongst rigid or deformable registrations. In our study of DL and DIR based contouring for adaptive radiotherapy in the head and neck region, DIR-based contours demonstrated superior geometric accuracy for the parotid glands and brainstem compared to the DL model (Mirada DLCexpert). Among DIR algorithms, no significant differences were observed, except for the MIM DIR volume ratio for brainstem. Our study found no significant dosimetric differences among DIR or DL contouring methods.
精确的轮廓勾画对于获得最佳治疗效果至关重要,无论是对于使用单张图像的非自适应放疗还是使用多张图像的自适应放疗(ART)。对于ART,自动分割有两种常见方法:(i)通过可变形图像配准(DIR)将先前图像中的先前轮廓传播到更新的重新计划图像;或(ii)由使用数据集训练的模型生成的深度学习(DL)。后一种方法的准确性受训练数据集的大小、多样性和质量影响,而前一种方法的准确性取决于先前轮廓的质量、图像对之间的图像对比度以及所使用的DIR算法。本研究评估了一种市售预训练DL模型(Mirada DLC04、DLC13、DLC14)和DIR工具(Velocity、MIM、Eclipse)在头颈部区域自适应重新计划的重新计划场景中生成轮廓的准确性。头颈部区域自适应重新计划的数据集(n = 9例患者)包括具有临床批准轮廓和剂量的CT(n = 18)。将手动轮廓数据与深度学习模型(Mirada DLC04、DLC13、DLC14)以及图像配准传播的轮廓(刚性、使用MIM、Velocity和Eclipse进行可变形配准)进行比较。评估包括:(a)轮廓临床相关性评分;(b)轮廓分级评分;(c)放射肿瘤学顾问根据布劳威尔轮廓指南对手动和DL轮廓样式进行评估;以及(d)基于几何和剂量学指标的准确性评估。这些指标包括骰子相似系数(DSC)、平均一致距离(MDA)、豪斯多夫距离(HD)、体积比和剂量比。根据(i)轮廓相关性、(ii)手动轮廓分级评分和(iii)现有轮廓数据,筛选出用于统计分析的轮廓。统计分析评估几何和剂量学指标,以Velocity DIR作为对照。脊髓、腮腺、口腔、下颌骨、喉和脑干的轮廓相关性评分最高。轮廓分级评分表明,除了放射肿瘤学专家描述的臂丛神经和口腔轮廓样式存在差异外,大多数轮廓对于手动和DL轮廓而言在进行少量编辑后在临床上是可接受的。脑干和腮腺被筛选用于统计分析,数据表明:(i)DL和DIR轮廓之间没有剂量学差异的统计证据(所有p>0.1);(ii)在几何方面,对于腮腺,DIR算法(Velocity)在MDA(p =0.014)和HD(p<0.001)方面优于DL模型(Mirada DLCExpert),对于脑干存在体积差异(p =0.045);(iii)在刚性或可变形配准中,腮腺和脑干在几何或剂量方面没有差异的统计证据(所有p>0.1)。在我们对头颈部区域自适应放疗基于DL和DIR的轮廓勾画的研究中,与DL模型(Mirada DLCexpert)相比,基于DIR的轮廓在腮腺和脑干方面显示出更高的几何准确性。在DIR算法中,除了脑干的MIM DIR体积比外,未观察到显著差异。我们的研究发现DIR或DL轮廓勾画方法之间没有显著的剂量学差异。