AMVLM：用于半监督医学图像分割的对齐-多样性感知视觉语言模型

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

作者信息

Pan Qingtao, Li Zhengrong, Qiao Wenhao, Lou Jingjiao, Yang Qing, Yang Guang, Ji Bing

出版信息

IEEE Trans Med Imaging. 2025 May 23;PP. doi: 10.1109/TMI.2025.3573018.

DOI:10.1109/TMI.2025.3573018

Abstract

Low-quality pseudo labels pose a significant obstacle in semi-supervised medical image segmentation (SSMIS), impeding consistency learning on unlabeled data. Leveraging vision-language model (VLM) holds promise in ameliorating pseudo label quality by employing textual prompts to delineate segmentation regions, but it faces the challenge of cross-modal alignment uncertainty due to multiple correspondences (multiple images/texts tend to correspond to one text/image). Existing VLMs address this challenge by modeling semantics as distributions but such distributions lead to semantic degradation. To address these problems, we propose Alignment-Multiplicity Aware Vision-Language Model (AMVLM), a new VLM pre-training paradigm with two novel similarity metric strategies. (i) Cross-modal Similarity Supervision (CSS) proposes a probability distribution transformer to supervise similarity scores across fine-granularity semantics through measuring cross-modal distribution disparities, thus learning cross-modal multiple alignments. (ii) Intra-modal Contrastive Learning (ICL) takes into account the similarity metric of coarse-fine granularity information within each modality to encourage cross-modal semantic consistency. Furthermore, using the pretrained AMVLM, we propose a pioneering text-guided SSMIS network to compensate for the quality deficiencies of pseudo-labels. This network incorporates a text mask generator to produce multimodal supervision information, enhancing pseudo label quality and the model's consistency learning. Extensive experimentation validates the efficacy of our AMVLM-driven SSMIS, showcasing superior performance across four publicly available datasets. The code will be available at: https://github.com/QingtaoPan/AMVLM.

摘要

低质量伪标签在半监督医学图像分割（SSMIS）中构成了重大障碍，阻碍了对未标记数据的一致性学习。利用视觉语言模型（VLM）有望通过使用文本提示来描绘分割区域来改善伪标签质量，但由于存在多种对应关系（多个图像/文本往往对应于一个文本/图像），它面临跨模态对齐不确定性的挑战。现有的VLM通过将语义建模为分布来应对这一挑战，但这种分布会导致语义退化。为了解决这些问题，我们提出了对齐-多重感知视觉语言模型（AMVLM），这是一种具有两种新颖相似性度量策略的新型VLM预训练范式。（i）跨模态相似性监督（CSS）提出了一种概率分布变换器，通过测量跨模态分布差异来监督细粒度语义上的相似性分数，从而学习跨模态多重对齐。（ii）模态内对比学习（ICL）考虑了每个模态内粗细粒度信息的相似性度量，以鼓励跨模态语义一致性。此外，使用预训练的AMVLM，我们提出了一个开创性的文本引导SSMIS网络，以弥补伪标签的质量缺陷。该网络包含一个文本掩码生成器，以产生多模态监督信息，提高伪标签质量和模型的一致性学习。广泛的实验验证了我们的AMVLM驱动的SSMIS的有效性，在四个公开可用数据集上展示了卓越的性能。代码将在以下网址提供：https://github.com/QingtaoPan/AMVLM 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

AMVLM：用于半监督医学图像分割的对齐-多样性感知视觉语言模型

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

作者信息

出版信息

相似文献

AMVLM：用于半监督医学图像分割的对齐-多样性感知视觉语言模型

AMVLM: Alignment-Multiplicity Aware Vision-Language Model for Semi-Supervised Medical Image Segmentation.

作者信息

出版信息

相似文献