• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于手术手势识别的基于零样本提示的视频编码器

Zero-shot prompt-based video encoder for surgical gesture recognition.

作者信息

Rao Mingxing, Qin Yinhong, Kolouri Soheil, Wu Jie Ying, Moyer Daniel

机构信息

Department of Computer Science, Vanderbilt University, Nashville, USA.

出版信息

Int J Comput Assist Radiol Surg. 2025 Feb;20(2):311-321. doi: 10.1007/s11548-024-03257-1. Epub 2024 Sep 17.

DOI:10.1007/s11548-024-03257-1
PMID:39287713
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11807915/
Abstract

PURPOSE

In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so-called zero-shot capability). In this paper we investigate the feasibility of latter option.

METHODS

Leveraging the bridge-prompt framework, we prompt-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses.

RESULTS

Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture recognition tasks. Notably, it displays strong performance in zero-shot scenarios, where gestures/tasks that were not provided during the encoder training phase are included in the prediction phase. Additionally, we measure the benefit of inclusion text descriptions in the feature extractor training schema.

CONCLUSION

Bridge-prompt and similar pre-trained + prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable.

摘要

目的

为了构建一个能够支持多种手术操作的手术手势识别系统,要么获取非常大的带注释数据集,要么使拟合模型能够推广到新的标签(即所谓的零样本能力)。在本文中,我们研究了后一种选择的可行性。

方法

利用桥接提示框架,我们对预训练的视觉-文本模型(CLIP)进行提示调整,以用于手术视频中的手势识别。这既可以利用诸如文本等大量外部视频数据,也可以利用标签元数据和弱监督对比损失。

结果

我们的实验表明,基于提示的视频编码器在手术手势识别任务中优于标准编码器。值得注意的是,它在零样本场景中表现出强大的性能,即在预测阶段包含编码器训练阶段未提供的手势/任务。此外,我们衡量了在特征提取器训练模式中包含文本描述的益处。

结论

桥接提示和类似的预训练+提示调整的视频编码器模型为手术机器人提供了重要的视觉表示,特别是在手势识别任务中。鉴于手术任务(手势)的多样性,这些模型无需任何特定于任务(手势)的再训练即可进行零样本转移的能力使其具有极高的价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/5b54466ae601/11548_2024_3257_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/2730b56b2774/11548_2024_3257_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/bfea20e476dd/11548_2024_3257_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/08af2653e2a2/11548_2024_3257_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/5b54466ae601/11548_2024_3257_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/2730b56b2774/11548_2024_3257_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/bfea20e476dd/11548_2024_3257_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/08af2653e2a2/11548_2024_3257_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7e1/11807915/5b54466ae601/11548_2024_3257_Fig4_HTML.jpg

相似文献

1
Zero-shot prompt-based video encoder for surgical gesture recognition.用于手术手势识别的基于零样本提示的视频编码器
Int J Comput Assist Radiol Surg. 2025 Feb;20(2):311-321. doi: 10.1007/s11548-024-03257-1. Epub 2024 Sep 17.
2
Surgical gesture classification from video and kinematic data.基于视频和运动学数据的外科手势分类。
Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.
3
Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.机器人手术中手势和技能识别的跨模态自监督表示学习。
Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.
4
Multi-label zero-shot human action recognition via joint latent ranking embedding.基于联合潜在排序嵌入的多标签零镜头人体动作识别。
Neural Netw. 2020 Feb;122:1-23. doi: 10.1016/j.neunet.2019.09.029. Epub 2019 Oct 21.
5
TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition.TMMF:用于单阶段连续手势识别的时频多模态融合。
IEEE Trans Image Process. 2021;30:7689-7701. doi: 10.1109/TIP.2021.3108349. Epub 2021 Sep 10.
6
Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding.利用多模态风格编码的对抗解缠实现由文本和语音驱动的手势动画的零样本风格迁移。
Front Artif Intell. 2023 Jun 12;6:1142997. doi: 10.3389/frai.2023.1142997. eCollection 2023.
7
Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition.利用地理分布统计数据改进零样本物种识别
Animals (Basel). 2024 Jun 7;14(12):1716. doi: 10.3390/ani14121716.
8
Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions.发现运动基元,用于人类动作、手势和表情的无监督分组和一次性学习。
IEEE Trans Pattern Anal Mach Intell. 2013 Jul;35(7):1635-48. doi: 10.1109/TPAMI.2012.253.
9
Multi-target video-based face recognition and gesture recognition based on enhanced detection and multi-trajectory incremental learning.基于增强检测和多轨迹增量学习的多目标视频人脸识别和手势识别。
Technol Health Care. 2020;28(S1):25-35. doi: 10.3233/THC-209004.
10
Semi-Supervised Joint Learning for Hand Gesture Recognition from a Single Color Image.基于单彩色图像的手势识别的半监督联合学习。
Sensors (Basel). 2021 Feb 2;21(3):1007. doi: 10.3390/s21031007.

本文引用的文献

1
Anticipation for surgical workflow through instrument interaction and recognized Signals.通过器械交互和识别信号预测手术流程。
Med Image Anal. 2022 Nov;82:102611. doi: 10.1016/j.media.2022.102611. Epub 2022 Sep 6.
2
Gesture Recognition in Robotic Surgery With Multimodal Attention.机器人手术中的多模态注意力手势识别。
IEEE Trans Med Imaging. 2022 Jul;41(7):1677-1687. doi: 10.1109/TMI.2022.3147640. Epub 2022 Jun 30.
3
SD-Net: joint surgical gesture recognition and skill assessment.SD-Net:联合手术手势识别与技能评估。
Int J Comput Assist Radiol Surg. 2021 Oct;16(10):1675-1682. doi: 10.1007/s11548-021-02495-x. Epub 2021 Oct 16.
4
Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.机器人手术中手势和技能识别的跨模态自监督表示学习。
Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.
5
Gesture Recognition in Robotic Surgery: A Review.机器人手术中的手势识别:综述。
IEEE Trans Biomed Eng. 2021 Jun;68(6):2021-2035. doi: 10.1109/TBME.2021.3054828. Epub 2021 May 21.
6
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation.MS-TCN++:用于动作分割的多阶段时间卷积网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6647-6658. doi: 10.1109/TPAMI.2020.3021756. Epub 2023 May 5.
7
Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks.使用递归神经网络对机器人辅助手术中的活动进行分割和分类。
Int J Comput Assist Radiol Surg. 2019 Nov;14(11):2005-2020. doi: 10.1007/s11548-019-01953-x. Epub 2019 Apr 29.
8
Surgical gesture segmentation and recognition.手术手势分割与识别。
Med Image Comput Comput Assist Interv. 2013;16(Pt 3):339-46. doi: 10.1007/978-3-642-40760-4_43.
9
Surgical gesture classification from video and kinematic data.基于视频和运动学数据的外科手势分类。
Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.
10
Automatic recognition of surgical motions using statistical modeling for capturing variability.利用统计建模自动识别手术动作以捕捉变异性。
Stud Health Technol Inform. 2008;132:396-401.