Yang Hailong, Wang Jia, Wang Wenyan, Shi Shufang, Liu Lijing, Yao Yuhua, Tian Geng, Wang Peizhen, Yang Jialiang
School of Electrical and Information Engineering, Anhui University of Technology, No. 1530 Maxiang Road, Huashan District, Ma'anshan, Anhui 243032, China.
Department of Sciences, Geneis Beijing Co., Ltd., No. 31 Xinbei Road, Laiguangying, Chaoyang District, Beijing 100102, China.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf209.
Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.
准确预测癌症治疗中的患者生存率对于有效的治疗规划至关重要。不幸的是,当前模型往往未充分利用现有的大量多模态数据,影响了预测的可信度。本研究提出了MMSurv,这是一种可解释的多模态深度学习模型,用于预测不同类型癌症的生存率。MMSurv整合临床信息、测序数据以及苏木精和伊红染色的全切片图像(WSIs)来预测患者生存率。具体而言,我们将WSIs中的肿瘤区域分割成图像块,并使用神经网络将每个块编码为一维特征向量。然后,受自然语言处理启发,我们通过应用词嵌入技术对临床数据优化临床特征。为了更好地利用多模态数据的互补性,本研究提出了一种新颖的融合方法,即基于紧凑双线性池化和Transformer的多模态融合方法,该方法将双线性池化与Transformer架构相结合。然后,融合后的特征通过双层多实例学习模型进行处理,以去除与预后无关的图像块并预测每个患者的生存风险。此外,我们采用细胞分割来研究模型高度关注的图像块内的细胞组成,从而增强其解释能力。我们在来自癌症基因组图谱的六种癌症类型上评估了我们的方法。结果表明,与使用单模态图像数据相比,利用多模态数据可提高预测准确性,平均C指数从0.6750提高到0.7283。此外,我们使用C指数和五折交叉验证方法将我们提出的基线模型与现有最先进方法进行比较,结果显示我们模型的性能平均有近10%的显著提升。