一种用于结直肠病变多模态分类的病理关注多实例学习框架。

A pathology-attention multi-instance learning framework for multimodal classification of colorectal lesions.

作者信息

Fu Fanglei, Zhang Xeimei, Wang Zhaoxuan, Xie Luxi, Fu Mingxi, Peng Jing, Wu Jianfeng, Wang Zhe, Guan Tian, He Yonghong, Lin Jin-Shun, Zhu Lianghui, Dai Wenbin

机构信息

Department of Life and Health, Shenzhen International Graduate School, Tsinghua University, Shenzhen, Guangdong, China.

Department of Pathology, Liuzhou People's Hospital Affiliated to Guangxi Medical University, Liuzhou, Guangxi, China.

出版信息

Front Pharmacol. 2025 Jun 6;16:1592950. doi: 10.3389/fphar.2025.1592950. eCollection 2025.

DOI:10.3389/fphar.2025.1592950

PMID:40548052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12179520/

Abstract

INTRODUCTION

Colorectal cancer is the third most common cancer worldwide, and accurate pathological diagnosis is crucial for clinical intervention and prognosis assessment. Although deep learning has shown promise in classifying whole slide images (WSIs) in digital pathology, existing weakly supervised methods struggle to fully model the multimodal diagnostic process, which involves both visual feature analysis and pathological knowledge. Additionally, staining variability and tissue heterogeneity hinder model generalization.

METHODS

We propose a multimodal weakly supervised learning framework named PAT-MIL (Pathology-Attention-MIL), which performs five-class WSI-level classification. The model integrates dynamic attention mechanisms with expert-defined text prototypes. It includes: (1) the construction of pathology knowledge-driven text prototypes for semantic guidance, (2) a refinement strategy that gradually adjusts category centers to adaptively improve prototype distribution, and (3) a loss balancing method that dynamically adjusts training weights based on gradient feedback to optimize both visual clustering and semantic alignment.

RESULTS

PAT-MIL achieves an accuracy of 86.45% (AUC = 0.9624) on an internal five-class dataset, outperforming ABMIL and DSMIL by +2.96% and +2.19%, respectively. On external datasets CRS-2024 and UniToPatho, the model reaches 95.78% and 84.09% accuracy, exceeding the best baselines by 2.22% and 5.68%, respectively.

DISCUSSION

These results demonstrate that PAT-MIL effectively mitigates staining variability and enhances cross-center generalization through the collaborative modeling of visual and textual modalities. It provides a robust solution for colorectal lesion classification without relying on pixel-level annotations, advancing the field of multimodal pathological image analysis.

摘要

引言

结直肠癌是全球第三大常见癌症，准确的病理诊断对于临床干预和预后评估至关重要。尽管深度学习在数字病理学中对全切片图像（WSIs）分类方面显示出了潜力，但现有的弱监督方法难以对涉及视觉特征分析和病理知识的多模态诊断过程进行充分建模。此外，染色变异性和组织异质性阻碍了模型的泛化能力。

方法

我们提出了一种名为PAT-MIL（Pathology-Attention-MIL）的多模态弱监督学习框架，用于进行五类WSI级别的分类。该模型将动态注意力机制与专家定义的文本原型相结合。它包括：（1）构建病理知识驱动的文本原型以进行语义指导，（2）一种逐步调整类别中心以自适应改善原型分布的细化策略，以及（3）一种基于梯度反馈动态调整训练权重以优化视觉聚类和语义对齐的损失平衡方法。