IEEE Trans Med Imaging. 2022 Apr;41(4):757-770. doi: 10.1109/TMI.2020.3021387. Epub 2022 Apr 1.
Cancer diagnosis, prognosis, mymargin and therapeutic response predictions are based on morphological information from histology slides and molecular profiles from genomic data. However, most deep learning-based objective outcome prediction and grading paradigms are based on histology or genomics alone and do not make use of the complementary information in an intuitive manner. In this work, we propose Pathomic Fusion, an interpretable strategy for end-to-end multimodal fusion of histology image and genomic (mutations, CNV, RNA-Seq) features for survival outcome prediction. Our approach models pairwise feature interactions across modalities by taking the Kronecker product of unimodal feature representations, and controls the expressiveness of each representation via a gating-based attention mechanism. Following supervised learning, we are able to interpret and saliently localize features across each modality, and understand how feature importance shifts when conditioning on multimodal input. We validate our approach using glioma and clear cell renal cell carcinoma datasets from the Cancer Genome Atlas (TCGA), which contains paired whole-slide image, genotype, and transcriptome data with ground truth survival and histologic grade labels. In a 15-fold cross-validation, our results demonstrate that the proposed multimodal fusion paradigm improves prognostic determinations from ground truth grading and molecular subtyping, as well as unimodal deep networks trained on histology and genomic data alone. The proposed method establishes insight and theory on how to train deep networks on multimodal biomedical data in an intuitive manner, which will be useful for other problems in medicine that seek to combine heterogeneous data streams for understanding diseases and predicting response and resistance to treatment. Code and trained models are made available at: https://github.com/mahmoodlab/PathomicFusion.
癌症的诊断、预后、肿瘤切缘和治疗反应的预测是基于组织学切片的形态学信息和基因组数据的分子谱。然而,大多数基于深度学习的客观预后预测和分级范式都是基于组织学或基因组数据,并且没有以直观的方式利用互补信息。在这项工作中,我们提出了 Pathomic Fusion,这是一种用于组织学图像和基因组(突变、CNV、RNA-Seq)特征的端到端多模态融合的可解释策略,用于生存结果预测。我们的方法通过对单模态特征表示的 Kronecker 积来对模态间的特征交互建模,并通过基于门控的注意力机制来控制每个表示的表达能力。在监督学习之后,我们能够对每个模态进行解释和突出特征定位,并理解在多模态输入条件下特征重要性的变化。我们使用癌症基因组图谱(TCGA)中的 glioma 和 clear cell renal cell carcinoma 数据集来验证我们的方法,该数据集包含配对的全幻灯片图像、基因型和转录组数据,以及生存和组织学分级的真实标签。在 15 倍交叉验证中,我们的结果表明,所提出的多模态融合范式提高了真实分级和分子分型的预后判断,以及仅基于组织学和基因组数据训练的单模态深度网络。该方法为如何以直观的方式在多模态生物医学数据上训练深度网络提供了深入的见解和理论,这将对其他试图结合异质数据流来理解疾病和预测治疗反应和耐药性的医学问题有用。代码和训练模型可在 https://github.com/mahmoodlab/PathomicFusion 上获得。