Department of Nuclear Medicine, University of Duisburg-Essen and German Cancer Consortium (DKTK)-University Hospital, Hufelandstraße 55, 45147, Essen, Germany.
West German Cancer Center, Essen-Münster, Germany.
Eur J Nucl Med Mol Imaging. 2021 Sep;48(10):3141-3150. doi: 10.1007/s00259-021-05270-x. Epub 2021 Mar 5.
Manual quantification of the metabolic tumor volume (MTV) from whole-body F-FDG PET/CT is time consuming and therefore usually not applied in clinical routine. It has been shown that neural networks might assist nuclear medicine physicians in such quantification tasks. However, little is known if such neural networks have to be designed for a specific type of cancer or whether they can be applied to various cancers. Therefore, the aim of this study was to evaluate the accuracy of a neural network in a cancer that was not used for its training.
Fifty consecutive breast cancer patients that underwent F-FDG PET/CT were included in this retrospective analysis. The PET-Assisted Reporting System (PARS) prototype that uses a neural network trained on lymphoma and lung cancer F-FDG PET/CT data had to detect pathological foci and determine their anatomical location. Consensus reads of two nuclear medicine physicians together with follow-up data served as diagnostic reference standard; 1072 F-FDG avid foci were manually segmented. The accuracy of the neural network was evaluated with regard to lesion detection, anatomical position determination, and total tumor volume quantification.
If PERCIST measurable foci were regarded, the neural network displayed high per patient sensitivity and specificity in detecting suspicious F-FDG foci (92%; CI = 79-97% and 98%; CI = 94-99%). If all FDG-avid foci were regarded, the sensitivity degraded (39%; CI = 30-50%). The localization accuracy was high for body part (98%; CI = 95-99%), region (88%; CI = 84-90%), and subregion (79%; CI = 74-84%). There was a high correlation of AI derived and manually segmented MTV (R = 0.91; p < 0.001). AI-derived whole-body MTV (HR = 1.275; CI = 1.208-1.713; p < 0.001) was a significant prognosticator for overall survival. AI-derived lymph node MTV (HR = 1.190; CI = 1.022-1.384; p = 0.025) and liver MTV (HR = 1.149; CI = 1.001-1.318; p = 0.048) were predictive for overall survival in a multivariate analysis.
Although trained on lymphoma and lung cancer, PARS showed good accuracy in the detection of PERCIST measurable lesions. Therefore, the neural network seems not prone to the clever Hans effect. However, the network has poor accuracy if all manually segmented lesions were used as reference standard. Both the whole body and organ-wise MTV were significant prognosticators of overall survival in advanced breast cancer.
全身 F-FDG PET/CT 代谢肿瘤体积(MTV)的手动量化既耗时又费力,因此通常不适用于临床常规。已证明神经网络可以帮助核医学医师完成此类量化任务。但是,对于神经网络是否必须针对特定类型的癌症进行设计,或者是否可以将其应用于各种癌症,人们知之甚少。因此,本研究的目的是评估一种在未用于训练的癌症中使用的神经网络的准确性。
本回顾性分析纳入了 50 例连续的乳腺癌患者,这些患者均接受了 F-FDG PET/CT 检查。使用基于淋巴瘤和肺癌 F-FDG PET/CT 数据训练的神经网络的 PET 辅助报告系统(PARS)原型,必须检测病理性病灶并确定其解剖位置。两名核医学医师的共识读片以及随访数据被用作诊断参考标准;手动分割了 1072 个 F-FDG 高摄取病灶。使用病变检测、解剖位置确定和总肿瘤体积量化评估神经网络的准确性。
如果将 PERCIST 可测量病灶视为阳性,则神经网络在检测可疑 F-FDG 病灶方面具有较高的每位患者的灵敏度和特异性(92%;置信区间 [CI] = 79-97%和 98%;CI = 94-99%)。如果将所有 FDG 摄取病灶视为阳性,则灵敏度降低(39%;CI = 30-50%)。神经网络对身体部位(98%;CI = 95-99%)、区域(88%;CI = 84-90%)和亚区(79%;CI = 74-84%)的定位准确性很高。人工智能(AI)衍生的 MTV 与手动分割的 MTV 高度相关(R = 0.91;p < 0.001)。AI 衍生的全身 MTV(HR = 1.275;CI = 1.208-1.713;p < 0.001)是总生存期的显著预后因素。AI 衍生的淋巴结 MTV(HR = 1.190;CI = 1.022-1.384;p = 0.025)和肝脏 MTV(HR = 1.149;CI = 1.001-1.318;p = 0.048)在多变量分析中对总生存期具有预测性。
尽管 PARS 是基于淋巴瘤和肺癌进行训练的,但在检测 PERCIST 可测量的病变方面,它的准确性很高。因此,神经网络似乎不易受到聪明汉斯效应的影响。但是,如果将所有手动分割的病灶作为参考标准,网络的准确性就很差。全身 MTV 和器官 MTV 都是晚期乳腺癌总生存期的显著预后因素。