基于文本提示的医学图像大词汇量分割

Large-vocabulary segmentation for medical images with text prompts.

作者信息

Zhao Ziheng, Zhang Yao, Wu Chaoyi, Zhang Xiaoman, Zhou Xiao, Zhang Ya, Wang Yanfeng, Xie Weidi

机构信息

Shanghai Jiao Tong University, Shanghai, China.

Shanghai Artificial Intelligence Laboratory, Shanghai, China.

出版信息

NPJ Digit Med. 2025 Sep 2;8(1):566. doi: 10.1038/s41746-025-01964-w.

DOI:10.1038/s41746-025-01964-w

PMID:40897901

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12405521/

Abstract

This paper aims to build a model that can Segment Anything in 3D medical images, driven by medical terminologies as Text prompts, termed as SAT. Our main contributions are three-fold: (i) We construct the first multimodal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then, we build the largest and most comprehensive segmentation dataset for training, collecting over 22K 3D scans from 72 datasets, across 497 classes, with careful standardization on both image and label space; (ii) We propose to inject medical knowledge into a text encoder via contrastive learning and formulate a large-vocabulary segmentation model that can be prompted by medical terminologies in text form. (iii) We train SAT-Nano (110M parameters) and SAT-Pro (447M parameters). SAT-Pro achieves comparable performance to 72 nnU-Nets-the strongest specialist models trained on each dataset (over 2.2B parameters combined)-over 497 categories. Compared with the interactive approach MedSAM, SAT-Pro consistently outperforms across all 7 human body regions with +7.1% average Dice Similarity Coefficient (DSC) improvement, while showing enhanced scalability and robustness. On 2 external (cross-center) datasets, SAT-Pro achieves higher performance than all baselines (+3.7% average DSC), demonstrating superior generalization ability.

摘要

本文旨在构建一个模型，该模型能够在医学术语作为文本提示的驱动下，对3D医学图像中的任何物体进行分割，称为SAT。我们的主要贡献有三个方面：（i）我们构建了第一个关于人体解剖学的多模态知识树，包括6502个解剖学术语；然后，我们构建了用于训练的最大、最全面的分割数据集，从72个数据集中收集了超过22K的3D扫描数据，涵盖497个类别，并对图像和标签空间进行了仔细的标准化；（ii）我们建议通过对比学习将医学知识注入文本编码器，并制定一个可以由文本形式的医学术语提示的大词汇量分割模型。（iii）我们训练了SAT-Nano（1.1亿参数）和SAT-Pro（4.47亿参数）。SAT-Pro在497个类别上的性能与72个nnU-Net相当，后者是在每个数据集上训练的最强的专业模型（总共超过22亿参数）。与交互式方法MedSAM相比，SAT-Pro在所有7个人体区域上始终表现更优，平均骰子相似度系数（DSC）提高了7.1%，同时显示出更强的可扩展性和鲁棒性。在2个外部（跨中心）数据集上，SAT-Pro的性能高于所有基线（平均DSC提高3.7%），证明了其卓越的泛化能力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于文本提示的医学图像大词汇量分割

Large-vocabulary segmentation for medical images with text prompts.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于文本提示的医学图像大词汇量分割

Large-vocabulary segmentation for medical images with text prompts.

作者信息

机构信息

出版信息

相似文献

本文引用的文献