Boellaard Ronald, Rahmim Arman, Eertink Jacoba J, Duehrsen Ulrich, Kurch Lars, Lugtenburg Pieternella J, Wiegers Sanne E, Zwezerijnen Gerben J C, Zijlstra Josée M, Heymans Martijn W, Buvat Irène
Department of Radiology and Nuclear Medicine, Amsterdam UMC, Cancer Center Amsterdam, Amsterdam, The Netherlands;
Departments of Radiology and Physics, University of British Columbia, Vancouver, British Columbia, Canada.
J Nucl Med. 2025 Aug 1;66(8):1169-1175. doi: 10.2967/jnumed.124.269425.
In medical imaging, challenges are competitions that aim to provide a fair comparison of different methodologic solutions to a common problem. Challenges typically focus on addressing real-world problems, such as segmentation, detection, and prediction tasks, using various types of medical images and associated data. Here, we describe the organization and results of such a challenge to compare machine-learning models for predicting survival in patients with diffuse large B-cell lymphoma using a baseline F-FDG PET/CT radiomics dataset. This challenge aimed to predict progression-free survival (PFS) in patients with diffuse large B-cell lymphoma, either as a binary outcome (shorter than 2 y versus longer than 2 y) or as a continuous outcome (survival in months). All participants were provided with a radiomic training dataset, including the ground truth survival for designing a predictive model and a radiomic test dataset without ground truth. Figures of merit (FOMs) used to assess model performance were the root-mean-square error for continuous outcomes and the C-index for 1-, 2-, and 3-y PFS binary outcomes. The challenge was endorsed and initiated by the Society of Nuclear Medicine and Molecular Imaging AI Task Force. Nineteen models for predicting PFS as a continuous outcome from 15 teams were received. Among those models, external validation identified 6 models showing similar performance to that of a simple general linear reference model using SUV and total metabolic tumor volumes (TMTV) only. Twelve models for predicting binary outcomes were submitted by 9 teams. External validation showed that 1 model had higher, but nonsignificant, C-index values compared with values obtained by a simple logistic regression model using SUV and TMTV. Some of the radiomic-based machine-learning models developed by participants showed better FOMs than did simple linear or logistic regression models based on SUV and TMTV only, although the differences in observed FOMs were nonsignificant. This suggests that, for the challenge dataset, there was limited or no value seen from the addition of sophisticated radiomic features and use of machine learning when developing models for outcome prediction.
在医学成像领域,挑战赛是旨在对针对常见问题的不同方法解决方案进行公平比较的竞赛。挑战赛通常聚焦于利用各种类型的医学图像及相关数据来解决现实世界的问题,如分割、检测和预测任务。在此,我们描述了这样一场挑战赛的组织情况和结果,该挑战赛使用基线F-FDG PET/CT影像组学数据集来比较用于预测弥漫性大B细胞淋巴瘤患者生存率的机器学习模型。 这项挑战赛旨在预测弥漫性大B细胞淋巴瘤患者的无进展生存期(PFS),预测结果可以是二元结果(短于2年与长于2年),也可以是连续结果(以月为单位的生存期)。所有参与者都获得了一个影像组学训练数据集,包括用于设计预测模型的真实生存情况,以及一个没有真实情况的影像组学测试数据集。用于评估模型性能的评价指标(FOM),对于连续结果是均方根误差,对于1年、2年和3年PFS二元结果是C指数。该挑战赛得到了核医学与分子影像学会人工智能特别工作组的认可和发起。收到了来自15个团队的19个用于将PFS预测为连续结果的模型。在这些模型中,外部验证确定有6个模型的表现与仅使用SUV和总代谢肿瘤体积(TMTV)的简单通用线性参考模型相似。9个团队提交了12个用于预测二元结果的模型。外部验证表明,与仅使用SUV和TMTV的简单逻辑回归模型相比,有1个模型的C指数值更高,但差异不显著。参与者开发的一些基于影像组学的机器学习模型显示出比仅基于SUV和TMTV的简单线性或逻辑回归模型更好的评价指标,尽管观察到的评价指标差异不显著。这表明,对于该挑战赛数据集,在开发用于结果预测的模型时,添加复杂的影像组学特征和使用机器学习的价值有限或没有价值。