From the Division of Nuclear Medicine and Molecular Imaging, Geneva University Hospital, Geneva, Switzerland.
Research Centre for Nuclear Medicine, Shariati Hospital, Tehran University of Medical Sciences, Tehran, Iran.
Clin Nucl Med. 2021 Nov 1;46(11):872-883. doi: 10.1097/RLU.0000000000003789.
The availability of automated, accurate, and robust gross tumor volume (GTV) segmentation algorithms is critical for the management of head and neck cancer (HNC) patients. In this work, we evaluated 3 state-of-the-art deep learning algorithms combined with 8 different loss functions for PET image segmentation using a comprehensive training set and evaluated its performance on an external validation set of HNC patients.
18F-FDG PET/CT images of 470 patients presenting with HNC on which manually defined GTVs serving as standard of reference were used for training (340 patients), evaluation (30 patients), and testing (100 patients from different centers) of these algorithms. PET image intensity was converted to SUVs and normalized in the range (0-1) using the SUVmax of the whole data set. PET images were cropped to 12 × 12 × 12 cm3 subvolumes using isotropic voxel spacing of 3 × 3 × 3 mm3 containing the whole tumor and neighboring background including lymph nodes. We used different approaches for data augmentation, including rotation (-15 degrees, +15 degrees), scaling (-20%, 20%), random flipping (3 axes), and elastic deformation (sigma = 1 and proportion to deform = 0.7) to increase the number of training sets. Three state-of-the-art networks, including Dense-VNet, NN-UNet, and Res-Net, with 8 different loss functions, including Dice, generalized Wasserstein Dice loss, Dice plus XEnt loss, generalized Dice loss, cross-entropy, sensitivity-specificity, and Tversky, were used. Overall, 28 different networks were built. Standard image segmentation metrics, including Dice similarity, image-derived PET metrics, first-order, and shape radiomic features, were used for performance assessment of these algorithms.
The best results in terms of Dice coefficient (mean ± SD) were achieved by cross-entropy for Res-Net (0.86 ± 0.05; 95% confidence interval [CI], 0.85-0.87), Dense-VNet (0.85 ± 0.058; 95% CI, 0.84-0.86), and Dice plus XEnt for NN-UNet (0.87 ± 0.05; 95% CI, 0.86-0.88). The difference between the 3 networks was not statistically significant (P > 0.05). The percent relative error (RE%) of SUVmax quantification was less than 5% in networks with a Dice coefficient more than 0.84, whereas a lower RE% (0.41%) was achieved by Res-Net with cross-entropy loss. For maximum 3-dimensional diameter and sphericity shape features, all networks achieved a RE ≤ 5% and ≤10%, respectively, reflecting a small variability.
Deep learning algorithms exhibited promising performance for automated GTV delineation on HNC PET images. Different loss functions performed competitively when using different networks and cross-entropy for Res-Net, Dense-VNet, and Dice plus XEnt for NN-UNet emerged as reliable networks for GTV delineation. Caution should be exercised for clinical deployment owing to the occurrence of outliers in deep learning-based algorithms.
自动、准确且稳健的大体肿瘤体积 (GTV) 分割算法的可用性对于头颈部癌症 (HNC) 患者的管理至关重要。在这项工作中,我们评估了 3 种最先进的深度学习算法,结合 8 种不同的损失函数,用于使用综合训练集的 PET 图像分割,并在来自不同中心的 100 名 HNC 患者的外部验证集上评估其性能。
使用 470 名患有 HNC 的患者的 18F-FDG PET/CT 图像,这些图像上手动定义的 GTV 作为标准参考,用于这些算法的培训 (340 名患者)、评估 (30 名患者) 和测试 (来自不同中心的 100 名患者)。PET 图像的强度转换为 SUV,并使用整个数据集的 SUVmax 将其归一化为范围 (0-1)。使用各向同性体素间距 3×3×3mm3 将 PET 图像裁剪为 12×12×12cm3 子体积,其中包含整个肿瘤和包括淋巴结在内的相邻背景。我们使用了不同的数据增强方法,包括旋转 (-15 度,+15 度)、缩放 (-20%,20%)、随机翻转 (3 轴) 和弹性变形 (sigma=1 和变形比例=0.7),以增加训练集的数量。我们使用了 3 种最先进的网络,包括 Dense-VNet、NN-UNet 和 Res-Net,以及 8 种不同的损失函数,包括 Dice、广义 Wasserstein Dice 损失、Dice 加 XEnt 损失、广义 Dice 损失、交叉熵、敏感性特异性和 Tversky,构建了 28 种不同的网络。使用标准的图像分割指标,包括 Dice 相似性、图像衍生的 PET 指标、一阶和形状放射特征,来评估这些算法的性能。
在 Dice 系数 (平均值±标准差)方面,交叉熵在 Res-Net (0.86±0.05;95%置信区间 [CI],0.85-0.87)、Dense-VNet (0.85±0.058;95% CI,0.84-0.86)和 Dice 加 XEnt 在 NN-UNet (0.87±0.05;95% CI,0.86-0.88)方面取得了最佳结果。这 3 种网络之间的差异没有统计学意义 (P>0.05)。在 Dice 系数大于 0.84 的网络中,SUVmax 定量的相对误差 (RE%)小于 5%,而使用交叉熵损失的 Res-Net 则实现了较低的 0.41%RE%。对于最大三维直径和球形度形状特征,所有网络的 RE%均小于 5%和 10%,分别反映了较小的可变性。
深度学习算法对头颈部癌症 PET 图像的自动 GTV 描绘表现出有前途的性能。当使用不同的网络时,不同的损失函数表现出竞争性,而对于 Res-Net,交叉熵和对于 NN-UNet,Dense-VNet 和 Dice 加 XEnt 脱颖而出,成为 GTV 描绘的可靠网络。由于深度学习算法中出现异常值,因此在临床部署时应谨慎。