Asbach John C, Singh Anurag K, Iovoli Austin J, Farrugia Mark, Le Anh H
Jacobs School of Medicine and Biomedical Sciences, State University of New York at Buffalo, Buffalo, New York, USA.
Department of Radiation Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA.
Med Phys. 2025 Apr;52(4):2675-2687. doi: 10.1002/mp.17672. Epub 2025 Feb 10.
Given the recent increased emphasis on multimodal neural networks to solve complex modeling tasks, the problem of outcome prediction for a course of treatment can be framed as fundamentally multimodal in nature. A patient's response to treatment will vary based on their specific anatomy and the proposed treatment plan-these factors are spatial and closely related. However, additional factors may also have importance, such as non-spatial descriptive clinical characteristics, which can be structured as tabular data. It is critical to provide models with as comprehensive of a patient representation as possible, but inputs with differing data structures are incompatible in raw form; traditional models that consider these inputs require feature engineering prior to modeling. In neural networks, feature engineering can be organically integrated into the model itself, under one governing optimization, rather than performed prescriptively beforehand. However, the native incompatibility of different data structures must be addressed. Methods to reconcile structural incompatibilities in multimodal model inputs are called data fusion. We present a novel joint early pre-spatial (JEPS) fusion technique and demonstrate that differences in fusion approach can produce significant model performance differences even when the data is identical.
To present a novel pre-spatial fusion technique for volumetric neural networks and demonstrate its impact on model performance for pretreatment prediction of overall survival (OS).
From a retrospective cohort of 531 head and neck patients treated at our clinic, we prepared an OS dataset of 222 data-complete cases at a 2-year post-treatment time threshold. Each patient's data included CT imaging, dose array, approved structure set, and a tabular summary of the patient's demographics and survey data. To establish single-modality baselines, we fit both a Cox Proportional Hazards model (CPH) and a dense neural network on only the tabular data, then we trained a 3D convolutional neural network (CNN) on only the volume data. Then, we trained five competing architectures for fusion of both modalities: two early fusion models, a late fusion model, a traditional joint fusion model, and the novel JEPS, where clinical data is merged into training upstream of most convolution operations. We used standardized 10-fold cross validation to directly compare the performance of all models on identical train/test splits of patients, using area under the receiver-operator curve (AUC) as the primary performance metric. We used a two-tailed Student t-test to assess the statistical significance (p-value threshold 0.05) of any observed performance differences.
The JEPS design scored the highest, achieving a mean AUC of 0.779 ± 0.080. The late fusion model and clinical-only CPH model scored second and third highest with 0.746 ± 0.066 and 0.720 ± 0.091 mean AUC, respectively. The performance differences between these three models were not statistically significant. All other comparison models scored significantly worse than the top performing JEPS model.
For our OS evaluation, our JEPS fusion architecture achieves better integration of inputs and significantly improves predictive performance over most common multimodal approaches. The JEPS fusion technique is easily applied to any volumetric CNN.
鉴于近期对多模态神经网络解决复杂建模任务的重视程度不断提高,治疗过程的结果预测问题本质上可被视为多模态问题。患者对治疗的反应会因其特定解剖结构和拟议的治疗方案而有所不同——这些因素具有空间性且密切相关。然而,其他因素可能也很重要,例如非空间描述性临床特征,这些特征可被构建为表格数据。为模型提供尽可能全面的患者表征至关重要,但不同数据结构的输入在原始形式下是不兼容的;考虑这些输入的传统模型在建模之前需要进行特征工程。在神经网络中,特征工程可以在一个统一的优化下有机地集成到模型本身,而不是预先规定性地执行。然而,必须解决不同数据结构的固有不兼容性问题。用于协调多模态模型输入中结构不兼容性的方法称为数据融合。我们提出了一种新颖的联合早期预空间(JEPS)融合技术,并证明即使数据相同,融合方法的差异也会导致显著的模型性能差异。
提出一种用于体积神经网络的新颖预空间融合技术,并证明其对总生存期(OS)预处理预测模型性能的影响。
从我们诊所治疗的531例头颈部患者的回顾性队列中,我们在治疗后2年的时间阈值下准备了一个包含222个数据完整病例的OS数据集。每位患者的数据包括CT成像、剂量阵列、批准的结构集以及患者人口统计学和调查数据的表格摘要。为了建立单模态基线,我们仅在表格数据上拟合了Cox比例风险模型(CPH)和密集神经网络,然后仅在体积数据上训练了一个3D卷积神经网络(CNN)。然后,我们训练了五种用于融合两种模态的竞争架构:两种早期融合模型、一种晚期融合模型、一种传统联合融合模型以及新颖的JEPS,其中临床数据在大多数卷积操作上游合并到训练中。我们使用标准化的10折交叉验证,以受试者工作特征曲线下面积(AUC)作为主要性能指标,直接比较所有模型在相同患者训练/测试分割上的性能。我们使用双尾学生t检验来评估任何观察到的性能差异的统计显著性(p值阈值为0.05)。
JEPS设计得分最高,平均AUC为0.779±0.080。晚期融合模型和仅临床的CPH模型分别以0.746±0.066和0.720±0.091的平均AUC位列第二和第三。这三个模型之间的性能差异无统计学意义。所有其他比较模型的得分均显著低于表现最佳的JEPS模型。
对于我们的OS评估,我们的JEPS融合架构实现了更好的输入整合,并显著优于大多数常见多模态方法提高了预测性能。JEPS融合技术可轻松应用于任何体积CNN。