Suppr超能文献

通过序列缩减和基于文本预训练的变压器实现高效的全切片图像分类。

Efficient WSI classification with sequence reduction and transformers pretrained on text.

作者信息

Pisula Juan I, Bozek Katarzyna

机构信息

Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.

Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.

出版信息

Sci Rep. 2025 Feb 15;15(1):5612. doi: 10.1038/s41598-025-88139-5.

Abstract

From computer vision to protein fold prediction, Language Models (LMs) have proven successful in transferring their representation of sequential data to a broad spectrum of tasks beyond the domain of natural language processing. Whole Slide Image (WSI) analysis in digital pathology naturally fits to transformer-based architectures. In a pre-processing step analogous to text tokenization, large microscopy images are tessellated into smaller image patches. However, due to the massive size of WSIs comprising thousands of such patches, the problem of WSI classification has not been addressed via deep transformer architectures, let alone via available text-pre-trained deep transformer language models. We introduce SeqShort, a multi-head attention-based sequence shortening layer that summarizes a large WSI into a fixed- and short-sized sequence of feature vectors by removing redundant visual information. Our sequence shortening mechanism not only reduces the computational costs of self-attention on large inputs, it also allows to include standard positional encodings to the previously unordered bag of patches that compose a WSI. We use SeqShort to effectively classify WSIs in different digital pathology tasks using a deep, text pre-trained transformer model while fine-tuning less than 0.1% of its parameters, demonstrating that their knowledge about natural language transfers well to this domain.

摘要

从计算机视觉到蛋白质折叠预测,语言模型(LMs)已成功地将其对序列数据的表示应用于自然语言处理领域之外的广泛任务。数字病理学中的全切片图像(WSI)分析自然适合基于Transformer的架构。在类似于文本分词的预处理步骤中,大型显微镜图像被分割成较小的图像块。然而,由于包含数千个此类图像块的WSI规模巨大,WSI分类问题尚未通过深度Transformer架构解决,更不用说通过现有的文本预训练深度Transformer语言模型解决了。我们引入了SeqShort,这是一种基于多头注意力的序列缩短层,通过去除冗余视觉信息将大型WSI总结为固定大小的短特征向量序列。我们的序列缩短机制不仅降低了对大输入进行自注意力计算的成本,还允许将标准位置编码应用于构成WSI的先前无序的图像块集合。我们使用SeqShort,通过深度文本预训练Transformer模型在不同数字病理学任务中有效分类WSI,同时微调其不到0.1%的参数,证明其关于自然语言的知识能很好地迁移到该领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9099/11829941/344b9abe50bf/41598_2025_88139_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验