当多实例学习遇上基础模型:推进组织学全切片图像分析
When multiple instance learning meets foundation models: Advancing histological whole slide image analysis.
作者信息
Xu Hongming, Wang Mingkang, Shi Duanbo, Qin Huamin, Zhang Yunpeng, Liu Zaiyi, Madabhushi Anant, Gao Peng, Cong Fengyu, Lu Cheng
机构信息
Cancer Hospital of Dalian University of Technology, Dalian, China; School of Biomedical Engineering, Faculty of Medicine, Dalian University of Technology, Dalian, China; Key Laboratory of Integrated Circuit and Biomedical Electronic System, Liaoning Province, Dalian University of Technology, Dalian, China; Dalian Key Laboratory of Digital Medicine for Critical Diseases, Dalian University of Technology, Dalian, China.
School of Biomedical Engineering, Faculty of Medicine, Dalian University of Technology, Dalian, China.
出版信息
Med Image Anal. 2025 Apr;101:103456. doi: 10.1016/j.media.2025.103456. Epub 2025 Jan 14.
Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.
深度多实例学习(MIL)管道是用于全切片图像(WSI)分类的主流弱监督学习方法。然而,鉴于用于补丁级嵌入的基础模型(FM)最近的激增以及切片级聚合的多样性,这些广泛使用的方法之间如何相互比较仍不清楚。本文通过组织跨越七个临床相关的端到端预测任务的不同特征提取和聚合,使用来自4044名患有四种不同癌症类型患者的WSI,实现并系统比较了六种FM和六种最新的MIL方法。我们测试了计算病理学中的最新(SOTA)FM,包括CTransPath、PathoDuet、PLIP、CONCH和UNI,作为补丁级特征提取器。对基于注意力的池化、变压器和动态图等特征聚合器进行了全面测试。我们在癌症分级、生物标志物状态预测和微卫星不稳定性(MSI)预测方面的实验表明:(1)像UNI这样用更多样化的组织学图像训练的FM,在补丁嵌入方面优于训练数据集较小的通用模型,显著提高了下游MIL分类的准确性和模型训练收敛速度;(2)实例特征微调,即所谓的在线特征重新嵌入,以捕获细粒度细节和空间相互作用,通常可以进一步提高WSI分类性能;(3)FM通过实现有前景的分级分类、生物标志物状态和MSI预测,而无需像素级或补丁级注释,推动了MIL模型的发展。这些发现鼓励开发先进的、特定领域的FM,以实现更普遍适用的诊断任务,符合病理学中临床人工智能不断发展的需求。