• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

机器人手术中手势和技能识别的跨模态自监督表示学习。

Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.

出版信息

Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.

DOI:10.1007/s11548-021-02343-y
PMID:33759079
Abstract

PURPOSE

Multi- and cross-modal learning consolidates information from multiple data sources which may offer a holistic representation of complex scenarios. Cross-modal learning is particularly interesting, because synchronized data streams are immediately useful as self-supervisory signals. The prospect of achieving self-supervised continual learning in surgical robotics is exciting as it may enable lifelong learning that adapts to different surgeons and cases, ultimately leading to a more general machine understanding of surgical processes.

METHODS

We present a learning paradigm using synchronous video and kinematics from robot-mediated surgery. Our approach relies on an encoder-decoder network that maps optical flow to the corresponding kinematics sequence. Clustering on the latent representations reveals meaningful groupings for surgeon gesture and skill level. We demonstrate the generalizability of the representations on the JIGSAWS dataset by classifying skill and gestures on tasks not used for training.

RESULTS

For tasks seen in training, we report a 59 to 70% accuracy in surgical gestures classification. On tasks beyond the training setup, we note a 45 to 65% accuracy. Qualitatively, we find that unseen gestures form clusters in the latent space of novice actions, which may enable the automatic identification of novel interactions in a lifelong learning scenario.

CONCLUSION

From predicting the synchronous kinematics sequence, optical flow representations of surgical scenes emerge that separate well even for new tasks that the model had not seen before. While the representations are useful immediately for a variety of tasks, the self-supervised learning paradigm may enable research in lifelong and user-specific learning.

摘要

目的

多模态和跨模态学习整合了来自多个数据源的信息,这些信息可能提供对复杂场景的整体表示。跨模态学习特别有趣,因为同步数据流立即可以作为自我监督信号使用。在手术机器人中实现自我监督持续学习的前景令人兴奋,因为它可以实现适应不同外科医生和病例的终身学习,最终导致机器对手术过程有更全面的理解。

方法

我们提出了一种使用机器人介导手术中的同步视频和运动学的学习范例。我们的方法依赖于一个编码器-解码器网络,该网络将光流映射到相应的运动学序列。在潜在表示上进行聚类揭示了外科医生手势和技能水平的有意义的分组。我们通过在未用于训练的任务上对技能和手势进行分类,展示了表示的可泛化性在 JIGSAWS 数据集上的应用。

结果

对于在训练中看到的任务,我们报告了 59%到 70%的手术手势分类准确率。在超出训练设置的任务上,我们注意到 45%到 65%的准确率。定性地,我们发现未见过的手势在新手动作的潜在空间中形成聚类,这可能使自动识别终身学习场景中的新交互成为可能。

结论

从预测同步运动学序列中,出现了能够很好地区分的手术场景的光流表示,即使是模型以前没有见过的新任务也是如此。虽然这些表示立即对各种任务都很有用,但自我监督学习范例可能使终身学习和用户特定学习的研究成为可能。

相似文献

1
Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.机器人手术中手势和技能识别的跨模态自监督表示学习。
Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.
2
Biomechanics-machine learning system for surgical gesture analysis and development of technologies for minimal access surgery.用于手术手势分析的生物力学-机器学习系统以及微创外科技术的开发。
Surg Innov. 2014 Oct;21(5):504-12. doi: 10.1177/1553350613510612. Epub 2013 Dec 2.
3
Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery.基于卷积神经网络的深度学习在机器人辅助手术中的客观技能评估。
Int J Comput Assist Radiol Surg. 2018 Dec;13(12):1959-1970. doi: 10.1007/s11548-018-1860-1. Epub 2018 Sep 25.
4
Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows.多模态半监督学习在多粒度手术流程在线识别中的应用。
Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1075-1083. doi: 10.1007/s11548-024-03101-6. Epub 2024 Apr 1.
5
Surgical gesture classification from video and kinematic data.基于视频和运动学数据的外科手势分类。
Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.
6
Gesture Recognition in Robotic Surgery With Multimodal Attention.机器人手术中的多模态注意力手势识别。
IEEE Trans Med Imaging. 2022 Jul;41(7):1677-1687. doi: 10.1109/TMI.2022.3147640. Epub 2022 Jun 30.
7
Automated surgical skill assessment in RMIS training.机器人微创外科手术训练中的自动手术技能评估。
Int J Comput Assist Radiol Surg. 2018 May;13(5):731-739. doi: 10.1007/s11548-018-1735-5. Epub 2018 Mar 16.
8
A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.机器人手术中手势分割与识别的数据集及基准
IEEE Trans Biomed Eng. 2017 Sep;64(9):2025-2041. doi: 10.1109/TBME.2016.2647680. Epub 2017 Jan 4.
9
Predicting Surgical Experience After Robotic Nerve-sparing Radical Prostatectomy Simulation Using a Machine Learning-based Multimodal Analysis of Objective Performance Metrics.基于客观绩效指标的机器学习多模态分析预测机器人神经保留根治性前列腺切除术模拟后的手术经验。
Urol Pract. 2023 Sep;10(5):447-455. doi: 10.1097/UPJ.0000000000000426. Epub 2023 Jun 22.
10
Surgical gesture classification from video data.基于视频数据的手术手势分类
Med Image Comput Comput Assist Interv. 2012;15(Pt 1):34-41. doi: 10.1007/978-3-642-33415-3_5.

引用本文的文献

1
Untangling surgical gesture analysis-are we even speaking the same language? a systematic review.解析手术手势分析——我们说的是同一种语言吗?一项系统综述。
Surg Endosc. 2025 Sep;39(9):5538-5557. doi: 10.1007/s00464-025-11907-x. Epub 2025 Jul 31.
2
Zero-shot prompt-based video encoder for surgical gesture recognition.用于手术手势识别的基于零样本提示的视频编码器
Int J Comput Assist Radiol Surg. 2025 Feb;20(2):311-321. doi: 10.1007/s11548-024-03257-1. Epub 2024 Sep 17.
3
Pelphix: Surgical Phase Recognition from X-ray Images in Percutaneous Pelvic Fixation.
Pelphix:经皮骨盆固定术中X射线图像的手术阶段识别
Med Image Comput Comput Assist Interv. 2023 Oct;14228:133-143. doi: 10.1007/978-3-031-43996-4_13. Epub 2023 Oct 1.
4
Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows.多模态半监督学习在多粒度手术流程在线识别中的应用。
Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1075-1083. doi: 10.1007/s11548-024-03101-6. Epub 2024 Apr 1.
5
A vision transformer for decoding surgeon activity from surgical videos.一种从手术视频中解码外科医生活动的视觉转换器。
Nat Biomed Eng. 2023 Jun;7(6):780-796. doi: 10.1038/s41551-023-01010-8. Epub 2023 Mar 30.
6
The Impact of Machine Learning on 2D/3D Registration for Image-Guided Interventions: A Systematic Review and Perspective.机器学习对图像引导介入的二维/三维配准的影响:系统综述与展望
Front Robot AI. 2021 Aug 30;8:716007. doi: 10.3389/frobt.2021.716007. eCollection 2021.