Dang Thao M, Zhou Qifeng, Guo Yuzhi, Ma Hehuan, Na Saiyang, Dang Thao Bich, Gao Jean, Huang Junzhou
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United States.
Department of Pulmonary and Critical Care, University of Arizona, Phoenix, AZ, United States.
Front Med (Lausanne). 2025 Feb 25;12:1546452. doi: 10.3389/fmed.2025.1546452. eCollection 2025.
Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the bnormality-ware ultiodal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology.
全切片图像(WSIs)在癌症诊断和预后中起着至关重要的作用。然而,它们的数十亿像素分辨率、缺乏像素级注释以及对单峰视觉数据的依赖,给准确而高效的计算分析带来了挑战。现有方法通常将WSIs划分为数千个图像块,这增加了计算需求,并使得有效聚焦于诊断相关区域变得具有挑战性。此外,这些方法经常依赖于在自然图像上预训练的特征提取器,这些特征提取器并未针对病理学任务进行优化,并且忽略了诸如细胞和文本信息等多模态数据源,而这些数据源可以提供关键的见解。为了解决这些局限性,我们提出了异常感知多模态(AAMM)学习框架,该框架将异常检测和多模态特征学习集成用于WSI分类。AAMM纳入了高斯混合变分自编码器(GMVAE)来识别和选择最具信息性的图像块,在保留关键诊断信息的同时降低计算复杂度。它进一步集成了来自病理学特定基础模型的多模态特征,通过交叉注意力机制组合图像块级、细胞级和文本级表示。这种方法增强了对WSIs进行全面分析以用于癌症诊断和亚型分类的能力。在正常-肿瘤分类和癌症亚型分类上的大量实验表明,与现有最先进方法相比,AAMM具有卓越的性能。通过将异常检测与多模态特征集成相结合,我们的框架为推进计算病理学提供了一种高效且可扩展的解决方案。