Wang Lujia, Hawkins-Daarud Andrea, Swanson Kristin R, Hu Leland S, Li Jing
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA.
Mathematical Neuro-Oncology Lab in the Department of Neurosurgery at Mayo Clinic Arizona, Phoenix, AZ 85054 USA.
IEEE Trans Autom Sci Eng. 2022 Jul;19(3):2203-2215. doi: 10.1109/tase.2021.3076117. Epub 2021 May 13.
The automated capability of generating spatial prediction for a variable of interest is desirable in various science and engineering domains. Take Precision Medicine of cancer as an example, in which the goal is to match patients with treatments based on molecular markers identified in each patient's tumor. A substantial challenge, however, is that the molecular markers can vary significantly at different spatial locations of a tumor. If this spatial distribution could be predicted, the precision of cancer treatment could be greatly improved by adapting treatment to the spatial molecular heterogeneity. This is a challenging task because no technology is available to measure the molecular markers at each spatial location within a tumor. Biopsy samples provide direct measurement, but they are scarce/local. Imaging, such as MRI, is global, but it only provides proxy/indirect measurement. Also available are mechanistic models or domain knowledge, which are often approximate or incomplete. This paper proposes a novel machine learning framework to fuse the three sources of data/information to generate spatial prediction, namely the knowledge-infused global-local data fusion (KGL) model. A novel mathematical formulation is proposed and solved with theoretical study. We present a real-data application of predicting the spatial distribution of Tumor Cell Density (TCD)-an important molecular marker for brain cancer. A total of 82 biopsy samples were acquired from 18 patients with glioblastoma, together with 6 MRI contrast images from each patient and biological knowledge encoded by a PDE simulator-based mechanistic model called Proliferation-Invasion (PI). KGL achieved the highest prediction accuracy and minimum prediction uncertainty compared with a variety of competing methods. The result has important implications for providing individualized, spatially-optimized treatment for each patient.
在各种科学和工程领域中,对感兴趣的变量进行空间预测的自动化能力是很有必要的。以癌症精准医学为例,其目标是根据每个患者肿瘤中识别出的分子标记为患者匹配治疗方案。然而,一个重大挑战是,分子标记在肿瘤的不同空间位置可能有显著差异。如果能够预测这种空间分布,那么通过根据空间分子异质性调整治疗方案,癌症治疗的精准度将得到极大提高。这是一项具有挑战性的任务,因为目前没有技术能够测量肿瘤内每个空间位置的分子标记。活检样本提供直接测量,但数量稀少且具有局部性。成像技术,如磁共振成像(MRI),是全局性的,但只能提供替代/间接测量。此外,还有一些机理模型或领域知识,它们往往是近似的或不完整的。本文提出了一种新颖的机器学习框架,将这三种数据/信息源融合起来以生成空间预测,即知识注入全局-局部数据融合(KGL)模型。提出了一种新颖的数学公式,并通过理论研究进行求解。我们展示了一个实际数据应用,即预测肿瘤细胞密度(TCD)的空间分布,TCD是脑癌的一个重要分子标记。从18例胶质母细胞瘤患者身上共采集了82个活检样本,同时还有每个患者的6张MRI对比图像以及由一个基于偏微分方程模拟器的机理模型(称为增殖-侵袭(PI))编码的生物学知识。与各种竞争方法相比,KGL实现了最高的预测准确率和最小的预测不确定性。该结果对于为每个患者提供个性化的、空间优化的治疗具有重要意义。