一种用于结直肠病变多模态分类的病理关注多实例学习框架。
A pathology-attention multi-instance learning framework for multimodal classification of colorectal lesions.
作者信息
Fu Fanglei, Zhang Xeimei, Wang Zhaoxuan, Xie Luxi, Fu Mingxi, Peng Jing, Wu Jianfeng, Wang Zhe, Guan Tian, He Yonghong, Lin Jin-Shun, Zhu Lianghui, Dai Wenbin
机构信息
Department of Life and Health, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China.
Department of Pathology, Liuzhou People's Hospital Affiliated to Guangxi Medical University, Liuzhou, Guangxi, China.
出版信息
Front Pharmacol. 2025 Jun 6;16:1592950. doi: 10.3389/fphar.2025.1592950. eCollection 2025.
INTRODUCTION
Colorectal cancer is the third most common cancer worldwide, and accurate pathological diagnosis is crucial for clinical intervention and prognosis assessment. Although deep learning has shown promise in classifying whole slide images (WSIs) in digital pathology, existing weakly supervised methods struggle to fully model the multimodal diagnostic process, which involves both visual feature analysis and pathological knowledge. Additionally, staining variability and tissue heterogeneity hinder model generalization.
METHODS
We propose a multimodal weakly supervised learning framework named PAT-MIL (Pathology-Attention-MIL), which performs five-class WSI-level classification. The model integrates dynamic attention mechanisms with expert-defined text prototypes. It includes: (1) the construction of pathology knowledge-driven text prototypes for semantic guidance, (2) a refinement strategy that gradually adjusts category centers to adaptively improve prototype distribution, and (3) a loss balancing method that dynamically adjusts training weights based on gradient feedback to optimize both visual clustering and semantic alignment.
RESULTS
PAT-MIL achieves an accuracy of 86.45% (AUC = 0.9624) on an internal five-class dataset, outperforming ABMIL and DSMIL by +2.96% and +2.19%, respectively. On external datasets CRS-2024 and UniToPatho, the model reaches 95.78% and 84.09% accuracy, exceeding the best baselines by 2.22% and 5.68%, respectively.
DISCUSSION
These results demonstrate that PAT-MIL effectively mitigates staining variability and enhances cross-center generalization through the collaborative modeling of visual and textual modalities. It provides a robust solution for colorectal lesion classification without relying on pixel-level annotations, advancing the field of multimodal pathological image analysis.
引言
结直肠癌是全球第三大常见癌症,准确的病理诊断对于临床干预和预后评估至关重要。尽管深度学习在数字病理学中对全切片图像(WSIs)分类方面显示出了潜力,但现有的弱监督方法难以对涉及视觉特征分析和病理知识的多模态诊断过程进行充分建模。此外,染色变异性和组织异质性阻碍了模型的泛化能力。
方法
我们提出了一种名为PAT-MIL(Pathology-Attention-MIL)的多模态弱监督学习框架,用于进行五类WSI级别的分类。该模型将动态注意力机制与专家定义的文本原型相结合。它包括:(1)构建病理知识驱动的文本原型以进行语义指导,(2)一种逐步调整类别中心以自适应改善原型分布的细化策略,以及(3)一种基于梯度反馈动态调整训练权重以优化视觉聚类和语义对齐的损失平衡方法。
结果
PAT-MIL在内部五类数据集上的准确率达到86.45%(AUC = 0.9624),分别比ABMIL和DSMIL高出2.96%和2.19%。在外部数据集CRS-2024和UniToPatho上,该模型的准确率分别达到95.78%和84.09%,分别比最佳基线高出2.22%和5.68%。
讨论
这些结果表明,PAT-MIL通过视觉和文本模态的协同建模有效地减轻了染色变异性并增强了跨中心泛化能力。它为结直肠病变分类提供了一种强大的解决方案,而无需依赖像素级注释,推动了多模态病理图像分析领域的发展。