Yan Rui, Lv Zhilong, Yang Zhidong, Lin Senlin, Zheng Chunhou, Zhang Fa
IEEE J Biomed Health Inform. 2023 Aug 22;PP. doi: 10.1109/JBHI.2023.3307584.
The Transformer-based methods provide a good opportunity for modeling the global context of gigapixel whole slide image (WSI), however, there are still two main problems in applying Transformer to WSI-based survival analysis task. First, the training data for survival analysis is limited, which makes the model prone to overfitting. This problem is even worse for Transformer-based models which require large-scale data to train. Second, WSI is of extremely high resolution (up to 150,000 x 150,000 pixels) and is typically organized as a multi-resolution pyramid. Vanilla Transformer cannot model the hierarchical structure of WSI (such as patch cluster-level relationships), which makes it incapable of learning hierarchical WSI representation. To address these problems, in this paper, we propose a novel Sparse and Hierarchical Transformer (SH-Transformer) for survival analysis. Specifically, we introduce sparse self-attention to alleviate the overfitting problem, and propose a hierarchical Transformer structure to learn the hierarchical WSI representation. Experimental results based on three WSI datasets show that the proposed framework outperforms the state-of-the-art methods.
基于Transformer的方法为对千兆像素全幻灯片图像(WSI)的全局上下文进行建模提供了一个很好的机会,然而,将Transformer应用于基于WSI的生存分析任务仍存在两个主要问题。首先,用于生存分析的训练数据有限,这使得模型容易出现过拟合。对于需要大规模数据进行训练的基于Transformer的模型来说,这个问题更加严重。其次,WSI具有极高的分辨率(高达150,000 x 150,000像素),并且通常组织为多分辨率金字塔。普通Transformer无法对WSI的层次结构(如补丁聚类级别的关系)进行建模,这使得它无法学习层次化的WSI表示。为了解决这些问题,在本文中,我们提出了一种用于生存分析的新型稀疏分层Transformer(SH-Transformer)。具体来说,我们引入稀疏自注意力来缓解过拟合问题,并提出一种分层Transformer结构来学习层次化的WSI表示。基于三个WSI数据集的实验结果表明,所提出的框架优于现有方法。