Chen Jinchao, Liu Pei, Chen Chen, Su Ying, Wang Jiajia, Chen Cheng, Ai Xiantao, Lv Xiaoyi
College of Computer Science and Technology, Xinjiang University, Urumqi, 830046, China.
College of Software, Xinjiang University, Urumqi, 830046, China.
Interdiscip Sci. 2025 Sep 26. doi: 10.1007/s12539-025-00744-0.
Survival prediction involves multiple factors, such as histopathological image data and omics data, making it a typical multimodal task. In this work, we introduce semantic annotations for genes in different cell types based on cell biology knowledge, enabling the model to achieve interpretability at the cellular level. Since these cell type annotations are derived from the unique sites of origin for each cancer type, they can be more closely aligned with morphological features in whole slide images (WSIs) and address the issue of genomic annotation ambiguity. We then propose a multimodal fusion model, SurvTransformer, with multi-layer attention to fuse cell type tags (CTTs) and WSIs for survival prediction. Finally, through attention and integrated gradient attribution, the model provides biologically meaningful interpretable analysis at three different levels: cell type, gene, and histopathology image. Comparative experiments show that SurvTransformer achieves the highest consistency index across four cancer datasets. The survival curves generated are also statistically significant. Ablation experiments show that SurvTransformer outperforms models based on different labeling methods and attention representations. In terms of interpretability, case studies validate the effectiveness of SurvTransformer at three levels: cell type, gene, and histopathological image.
生存预测涉及多个因素,如组织病理学图像数据和组学数据,使其成为一个典型的多模态任务。在这项工作中,我们基于细胞生物学知识为不同细胞类型中的基因引入语义注释,使模型能够在细胞水平上实现可解释性。由于这些细胞类型注释源自每种癌症类型的独特起源部位,它们可以与全切片图像(WSIs)中的形态学特征更紧密地对齐,并解决基因组注释模糊的问题。然后,我们提出了一种多模态融合模型SurvTransformer,通过多层注意力机制融合细胞类型标签(CTTs)和WSIs进行生存预测。最后,通过注意力和集成梯度归因,该模型在细胞类型、基因和组织病理学图像三个不同层面提供了具有生物学意义的可解释分析。对比实验表明,SurvTransformer在四个癌症数据集中实现了最高的一致性指数。生成的生存曲线也具有统计学意义。消融实验表明,SurvTransformer优于基于不同标记方法和注意力表示的模型。在可解释性方面,案例研究在细胞类型、基因和组织病理学图像三个层面验证了SurvTransformer的有效性。