手术计算机视觉能否受益于大规模视觉基础模型？

Can surgical computer vision benefit from large-scale visual foundation models?

机构信息

DIA2M, DRCI, CHU Clermont-Ferrand, Clermont-Ferrand, France.

出版信息

Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1157-1163. doi: 10.1007/s11548-024-03125-y. Epub 2024 Apr 12.

DOI:10.1007/s11548-024-03125-y

PMID:38609735

Abstract

PURPOSE

We investigate whether foundation models pretrained on diverse visual data could be beneficial to surgical computer vision. We use instrument and uterus segmentation in mini-invasive procedures as benchmarks. We propose multiple supervised, unsupervised and few-shot supervised adaptations of foundation models, including two novel adaptation methods.

METHODS

We use DINOv1, DINOv2, DINOv2 with registers, and SAM backbones, with the ART-Net surgical instrument and the SurgAI3.8K uterus segmentation datasets. We investigate five approaches: DINO unsupervised, few-shot learning with a linear decoder, supervised learning with the proposed DINO-UNet adaptation, DPT with DINO encoder, and unsupervised learning with the proposed SAM adaptation.

RESULTS

We evaluate 17 models for instrument segmentation and 7 models for uterus segmentation and compare to existing ad hoc models for the tasks at hand. We show that the linear decoder can be learned with few shots. The unsupervised and linear decoder methods obtain slightly subpar results but could be considered useful in data scarcity settings. The unsupervised SAM model produces finer edges but has inconsistent outputs. However, DPT and DINO-UNet obtain strikingly good results, defining a new state of the art by outperforming the previous-best by 5.6 and 4.1 pp for instrument and 4.4 and 1.5 pp for uterus segmentation. Both methods obtain semantic and spatial precision, accurately segmenting intricate details.

CONCLUSION

Our results show the huge potential of using DINO and SAM for surgical computer vision, indicating a promising role for visual foundation models in medical image analysis, particularly in scenarios with limited or complex data.

摘要

目的

我们研究了在多样化视觉数据上预训练的基础模型是否对手术计算机视觉有益。我们以微创手术中的器械和子宫分割作为基准。我们提出了多种监督、无监督和少量监督的基础模型适应方法，包括两种新的适应方法。

方法

我们使用 DINOv1、DINOv2、带有注册的 DINOv2 和 SAM 骨干，以及 ART-Net 手术器械和 SurgAI3.8K 子宫分割数据集。我们研究了五种方法：DINO 无监督、带有线性解码器的少量学习、使用提出的 DINO-UNet 适应的监督学习、带有 DINO 编码器的 DPT 和带有提出的 SAM 适应的无监督学习。

结果

我们评估了 17 个用于器械分割的模型和 7 个用于子宫分割的模型，并与现有针对手头任务的特定模型进行了比较。我们表明，可以通过少量样本学习线性解码器。无监督和线性解码器方法的结果略差，但在数据稀缺的情况下可能被认为是有用的。无监督的 SAM 模型产生了更精细的边缘，但输出不一致。然而，DPT 和 DINO-UNet 获得了惊人的好结果，在器械分割方面分别比之前的最佳方法提高了 5.6 和 4.1 个百分点，在子宫分割方面提高了 4.4 和 1.5 个百分点。这两种方法都获得了语义和空间精度，能够准确地分割复杂的细节。

结论

我们的结果表明，使用 DINO 和 SAM 进行手术计算机视觉具有巨大的潜力，表明视觉基础模型在医学图像分析中具有广阔的应用前景，特别是在数据有限或复杂的情况下。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

手术计算机视觉能否受益于大规模视觉基础模型？

Can surgical computer vision benefit from large-scale visual foundation models?

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

手术计算机视觉能否受益于大规模视觉基础模型？

Can surgical computer vision benefit from large-scale visual foundation models?

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献