Pisula Juan I, Bozek Katarzyna
Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
Sci Rep. 2025 Feb 15;15(1):5612. doi: 10.1038/s41598-025-88139-5.
From computer vision to protein fold prediction, Language Models (LMs) have proven successful in transferring their representation of sequential data to a broad spectrum of tasks beyond the domain of natural language processing. Whole Slide Image (WSI) analysis in digital pathology naturally fits to transformer-based architectures. In a pre-processing step analogous to text tokenization, large microscopy images are tessellated into smaller image patches. However, due to the massive size of WSIs comprising thousands of such patches, the problem of WSI classification has not been addressed via deep transformer architectures, let alone via available text-pre-trained deep transformer language models. We introduce SeqShort, a multi-head attention-based sequence shortening layer that summarizes a large WSI into a fixed- and short-sized sequence of feature vectors by removing redundant visual information. Our sequence shortening mechanism not only reduces the computational costs of self-attention on large inputs, it also allows to include standard positional encodings to the previously unordered bag of patches that compose a WSI. We use SeqShort to effectively classify WSIs in different digital pathology tasks using a deep, text pre-trained transformer model while fine-tuning less than 0.1% of its parameters, demonstrating that their knowledge about natural language transfers well to this domain.
从计算机视觉到蛋白质折叠预测,语言模型(LMs)已成功地将其对序列数据的表示应用于自然语言处理领域之外的广泛任务。数字病理学中的全切片图像(WSI)分析自然适合基于Transformer的架构。在类似于文本分词的预处理步骤中,大型显微镜图像被分割成较小的图像块。然而,由于包含数千个此类图像块的WSI规模巨大,WSI分类问题尚未通过深度Transformer架构解决,更不用说通过现有的文本预训练深度Transformer语言模型解决了。我们引入了SeqShort,这是一种基于多头注意力的序列缩短层,通过去除冗余视觉信息将大型WSI总结为固定大小的短特征向量序列。我们的序列缩短机制不仅降低了对大输入进行自注意力计算的成本,还允许将标准位置编码应用于构成WSI的先前无序的图像块集合。我们使用SeqShort,通过深度文本预训练Transformer模型在不同数字病理学任务中有效分类WSI,同时微调其不到0.1%的参数,证明其关于自然语言的知识能很好地迁移到该领域。