深度学习用于头颈部癌F-FDG PET/CT肿瘤勾画的临床评估

Clinical Evaluation of Deep Learning for Tumor Delineation on F-FDG PET/CT of Head and Neck Cancer.

作者信息

Kovacs David G, Ladefoged Claes N, Andersen Kim F, Brittain Jane M, Christensen Charlotte B, Dejanovic Danijela, Hansen Naja L, Loft Annika, Petersen Jørgen H, Reichkendler Michala, Andersen Flemming L, Fischer Barbara M

机构信息

Department of Clinical Physiology and Nuclear Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark;

Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

出版信息

J Nucl Med. 2024 Feb 22;65(4):623-9. doi: 10.2967/jnumed.123.266574.

DOI:10.2967/jnumed.123.266574

PMID:38388516

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10995525/

Abstract

Artificial intelligence (AI) may decrease FFDG PET/CT-based gross tumor volume (GTV) delineation variability and automate tumor-volume-derived image biomarker extraction. Hence, we aimed to identify and evaluate promising state-of-the-art deep learning methods for head and neck cancer (HNC) PET GTV delineation. We trained and evaluated deep learning methods using retrospectively included scans of HNC patients referred for radiotherapy between January 2014 and December 2019 (ISRCTN16907234). We used 3 test datasets: an internal set to compare methods, another internal set to compare AI-to-expert variability and expert interobserver variability (IOV), and an external set to compare internal and external AI-to-expert variability. Expert PET GTVs were used as the reference standard. Our benchmark IOV was measured using the PET GTV of 6 experts. The primary outcome was the Dice similarity coefficient (DSC). ANOVA was used to compare methods, a paired test was used to compare AI-to-expert variability and expert IOV, an unpaired test was used to compare internal and external AI-to-expert variability, and post hoc Bland-Altman analysis was used to evaluate biomarker agreement. In total, 1,220 FFDG PET/CT scans of 1,190 patients (mean age ± SD, 63 ± 10 y; 858 men) were included, and 5 deep learning methods were trained using 5-fold cross-validation ( = 805). The nnU-Net method achieved the highest similarity (DSC, 0.80 [95% CI, 0.77-0.86]; = 196). We found no evidence of a difference between expert IOV and AI-to-expert variability (DSC, 0.78 for AI vs. 0.82 for experts; mean difference of 0.04 [95% CI, -0.01 to 0.09]; = 0.12; = 64). We found no evidence of a difference between the internal and external AI-to-expert variability (DSC, 0.80 internally vs. 0.81 externally; mean difference of 0.004 [95% CI, -0.05 to 0.04]; = 0.87; = 125). PET GTV-derived biomarkers of AI were in good agreement with experts. Deep learning can be used to automate FFDG PET/CT tumor-volume-derived imaging biomarkers, and the deep-learning-based volumes have the potential to assist clinical tumor volume delineation in radiation oncology.

摘要

人工智能（AI）可能会降低基于氟代脱氧葡萄糖正电子发射断层扫描/计算机断层扫描（FFDG PET/CT）的大体肿瘤体积（GTV）勾画的变异性，并使源自肿瘤体积的图像生物标志物提取自动化。因此，我们旨在识别和评估用于头颈癌（HNC）PET GTV勾画的有前景的先进深度学习方法。我们使用2014年1月至2019年12月期间转诊接受放疗的HNC患者的回顾性纳入扫描数据来训练和评估深度学习方法（国际标准随机对照试验编号：ISRCTN16907234）。我们使用了3个测试数据集：一个内部数据集用于比较方法，另一个内部数据集用于比较AI与专家之间的变异性以及专家之间的观察者间变异性（IOV），还有一个外部数据集用于比较内部和外部AI与专家之间的变异性。专家PET GTV用作参考标准。我们的基准IOV是使用6位专家的PET GTV测量的。主要结果是骰子相似系数（DSC）。使用方差分析来比较方法，使用配对t检验来比较AI与专家之间的变异性以及专家IOV，使用非配对t检验来比较内部和外部AI与专家之间的变异性，并使用事后Bland-Altman分析来评估生物标志物的一致性。总共纳入了1190例患者的1220次FFDG PET/CT扫描（平均年龄±标准差，63±10岁；858名男性），并使用五折交叉验证（n = 805）训练了5种深度学习方法。nnU-Net方法实现了最高的相似性（DSC，0.80 [95%置信区间，0.77 - 0.86]；n = 196）。我们没有发现专家IOV与AI与专家之间的变异性存在差异的证据（AI的DSC为0.78，专家的为0.82；平均差异为0.04 [95%置信区间，-0.01至0.09]；P = 0.12；n = 64）。我们没有发现内部和外部AI与专家之间的变异性存在差异的证据（内部DSC为0.80，外部为0.81；平均差异为0.004 [95%置信区间，-0.05至0.04]；P = 0.87；n = 125）。AI源自PET GTV的生物标志物与专家的结果高度一致。深度学习可用于使源自FFDG PET/CT肿瘤体积的成像生物标志物自动化，并且基于深度学习的体积有可能协助放射肿瘤学中的临床肿瘤体积勾画。