Suppr超能文献

用于医学分割任务的大型基础模型专业化的必要性和影响。

Necessity and impact of specialization of large foundation model for medical segmentation tasks.

作者信息

Nguyen Eric, Liu Hengjie, Ruan Dan

机构信息

Department of Radiation Oncology, University of California Los Angeles, Los Angeles, California, USA.

Department of Bioengineering, University of California Los Angeles, Los Angeles, California, USA.

出版信息

Med Phys. 2025 Jan;52(1):321-328. doi: 10.1002/mp.17470. Epub 2024 Oct 21.

Abstract

BACKGROUND

Large foundation models, such as the Segment Anything Model (SAM), have shown remarkable performance in image segmentation tasks. However, the optimal approach to achieve true utility of these models for domain-specific applications, such as medical image segmentation, remains an open question. Recent studies have released a medical version of the foundation model MedSAM by training on vast medical data, who promised SOTA medical segmentation. Independent community inspection and dissection is needed.

PURPOSE

Foundation models are developed for general purposes. On the other hand, stable delivery of reliable performance is key to clinical utility. This study aims at elucidating the potential advantage and limitations of landing the foundation models in clinical use by assessing the performance of off-the-shelf medical foundation model MedSAM for the segmentation of anatomical structures in pelvic MR images. We also explore the simple remedies by evaluating the dependency on prompting scheme. Finally, we demonstrate the need and performance gain of further specialized fine-tuning.

METHODS

MedSAM and its lightweight version LiteMedSAM were evaluated out-of-the-box on a public MR dataset consisting of 589 pelvic images split 80:20 for training and testing. An nnU-Net model was trained from scratch to serve as a benchmark and to provide bounding box prompts for MedSAM. MedSAM was evaluated using different quality bounding boxes, those derived from ground truth labels, those derived from nnU-Net, and those derived from the former two but with 5-pixel isometric expansion. Lastly, LiteMedSAM was refined on the training set and reevaluated on this task.

RESULTS

Out-of-the-box MedSAM and LiteMedSAM both performed poorly across the structure set, especially for disjoint or non-convex structures. Varying prompt with different bounding box inputs had minimal effect. For example, the mean Dice score and mean Hausdorff distances (in mm) for obturator internus using MedSAM and LiteMedSAM were {0.251 ± 0.110, 0.101 ± 0.079} and {34.142 ± 5.196, 33.688 ± 5.306}, respectively. Fine-tuning of LiteMedSAM led to significant performance gain, improving Dice score and Hausdorff distance for the obturator internus to 0.864 ± 0.123 and 5.022 ± 10.684, on par with nnU-Net with no significant difference in evaluation of most structures. All segmentation structures benefited significantly from specialized refinement, at varying improvement margin.

CONCLUSION

While our study alludes to the potential of deep learning models like MedSAM and LiteMedSAM for medical segmentation, it highlights the need for specialized refinement and adjudication. Off-the-shelf use of such large foundation models is highly likely to be suboptimal, and specialized fine-tuning is often necessary to achieve clinical desired accuracy and stability.

摘要

背景

大型基础模型,如分割一切模型(SAM),在图像分割任务中表现出卓越性能。然而,要使这些模型在特定领域应用(如医学图像分割)中真正发挥效用,最佳方法仍是一个悬而未决的问题。最近的研究通过在大量医学数据上进行训练,发布了基础模型MedSAM的医学版本,宣称其具有领先的医学分割性能。需要独立的社区检查和剖析。

目的

基础模型是为通用目的而开发的。另一方面,稳定地提供可靠性能是临床应用的关键。本研究旨在通过评估现成的医学基础模型MedSAM对盆腔磁共振图像中解剖结构的分割性能,阐明将基础模型应用于临床的潜在优势和局限性。我们还通过评估对提示方案的依赖性来探索简单的补救措施。最后,我们证明了进一步进行专门微调的必要性和性能提升。

方法

在一个由589张盆腔图像组成的公共磁共振数据集上对MedSAM及其轻量级版本LiteMedSAM进行开箱即用的评估,该数据集按80:20划分为训练集和测试集。从零开始训练一个nnU-Net模型作为基准,并为MedSAM提供边界框提示。使用不同质量的边界框对MedSAM进行评估,这些边界框分别来自真实标签、nnU-Net,以及由前两者派生但有5像素等距扩展的边界框。最后,在训练集上对LiteMedSAM进行优化,并在该任务上重新评估。

结果

开箱即用的MedSAM和LiteMedSAM在整个结构集上的表现都很差,尤其是对于不连续或非凸结构。使用不同的边界框输入改变提示的效果甚微。例如,使用MedSAM和LiteMedSAM分割闭孔内肌的平均Dice分数和平均豪斯多夫距离(单位:毫米)分别为{0.251 ± 0.110, 0.101 ± 0.079}和{34.142 ± 5.196, 33.688 ± 5.306}。对LiteMedSAM进行微调带来了显著的性能提升,闭孔内肌的Dice分数和豪斯多夫距离分别提高到0.864 ± 0.123和5.022 ± 10. + 684,与nnU-Net相当,在大多数结构的评估中没有显著差异。所有分割结构都从专门的优化中显著受益,提升幅度各不相同。

结论

虽然我们的研究暗示了像MedSAM和LiteMedSAM这样的深度学习模型在医学分割方面的潜力,但它强调了进行专门优化和裁决的必要性。直接使用这种大型基础模型很可能不是最优的,通常需要进行专门的微调才能达到临床所需的准确性和稳定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d742/11699994/afcf3d62af77/MP-52-321-g003.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验