• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态半监督学习在多粒度手术流程在线识别中的应用。

Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows.

机构信息

Department of Micro-Nano Mechanical Science and Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603, Japan.

Institutes of Innovation for Future Society, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan.

出版信息

Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1075-1083. doi: 10.1007/s11548-024-03101-6. Epub 2024 Apr 1.

DOI:10.1007/s11548-024-03101-6
PMID:38558289
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11178653/
Abstract

Purpose Surgical workflow recognition is a challenging task that requires understanding multiple aspects of surgery, such as gestures, phases, and steps. However, most existing methods focus on single-task or single-modal models and rely on costly annotations for training. To address these limitations, we propose a novel semi-supervised learning approach that leverages multimodal data and self-supervision to create meaningful representations for various surgical tasks. Methods Our representation learning approach conducts two processes. In the first stage, time contrastive learning is used to learn spatiotemporal visual features from video data, without any labels. In the second stage, multimodal VAE fuses the visual features with kinematic data to obtain a shared representation, which is fed into recurrent neural networks for online recognition. Results Our method is evaluated on two datasets: JIGSAWS and MISAW. We confirmed that it achieved comparable or better performance in multi-granularity workflow recognition compared to fully supervised models specialized for each task. On the JIGSAWS Suturing dataset, we achieve a gesture recognition accuracy of 83.3%. In addition, our model is more efficient in annotation usage, as it can maintain high performance with only half of the labels. On the MISAW dataset, we achieve 84.0% AD-Accuracy in phase recognition and 56.8% AD-Accuracy in step recognition. Conclusion Our multimodal representation exhibits versatility across various surgical tasks and enhances annotation efficiency. This work has significant implications for real-time decision-making systems within the operating room.

摘要

目的

手术流程识别是一项具有挑战性的任务,需要理解手术的多个方面,例如手势、阶段和步骤。然而,大多数现有的方法都侧重于单任务或单模态模型,并依赖于昂贵的标注来进行训练。为了解决这些限制,我们提出了一种新的半监督学习方法,该方法利用多模态数据和自监督来为各种手术任务创建有意义的表示。

方法

我们的表示学习方法进行了两个过程。在第一阶段,时间对比学习用于从视频数据中学习时空视觉特征,而无需任何标签。在第二阶段,多模态 VAE 将视觉特征与运动学数据融合,以获得共享表示,然后将其输入到递归神经网络中进行在线识别。

结果

我们的方法在两个数据集上进行了评估

JIGSAWS 和 MISAW。我们证实,与专门针对每个任务的全监督模型相比,它在多粒度工作流程识别方面实现了可比或更好的性能。在 JIGSAWS 缝合数据集上,我们实现了 83.3%的手势识别准确率。此外,我们的模型在标注使用方面更高效,因为它仅使用一半的标注就可以保持高性能。在 MISAW 数据集上,我们在阶段识别方面实现了 84.0%的 AD-准确率,在步骤识别方面实现了 56.8%的 AD-准确率。

结论

我们的多模态表示在各种手术任务中具有通用性,并提高了标注效率。这项工作对手术室中的实时决策系统具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/0b242f6753a2/11548_2024_3101_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/4c52ad751655/11548_2024_3101_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/6ebbad5b5673/11548_2024_3101_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/9358cf0a5f94/11548_2024_3101_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/0b242f6753a2/11548_2024_3101_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/4c52ad751655/11548_2024_3101_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/6ebbad5b5673/11548_2024_3101_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/9358cf0a5f94/11548_2024_3101_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d94/11178653/0b242f6753a2/11548_2024_3101_Fig4_HTML.jpg

相似文献

1
Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows.多模态半监督学习在多粒度手术流程在线识别中的应用。
Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1075-1083. doi: 10.1007/s11548-024-03101-6. Epub 2024 Apr 1.
2
Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition.基于渐进式未标记数据挖掘的半监督学习在标签高效手术流程识别中的应用。
Med Image Anal. 2021 Oct;73:102158. doi: 10.1016/j.media.2021.102158. Epub 2021 Jul 8.
3
MIcro-surgical anastomose workflow recognition challenge report.显微外科吻合术工作流程识别挑战赛报告。
Comput Methods Programs Biomed. 2021 Nov;212:106452. doi: 10.1016/j.cmpb.2021.106452. Epub 2021 Oct 10.
4
Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.机器人手术中手势和技能识别的跨模态自监督表示学习。
Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.
5
Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation.基于伪标签自训练的局部对比损失的半监督医学图像分割。
Med Image Anal. 2023 Jul;87:102792. doi: 10.1016/j.media.2023.102792. Epub 2023 Mar 11.
6
A microdiscectomy surgical video annotation framework for supervised machine learning applications.用于监督机器学习应用的微创手术视频标注框架。
Int J Comput Assist Radiol Surg. 2024 Oct;19(10):1947-1952. doi: 10.1007/s11548-024-03203-1. Epub 2024 Jul 19.
7
Semi-Supervised Joint Learning for Hand Gesture Recognition from a Single Color Image.基于单彩色图像的手势识别的半监督联合学习。
Sensors (Basel). 2021 Feb 2;21(3):1007. doi: 10.3390/s21031007.
8
LRTD: long-range temporal dependency based active learning for surgical workflow recognition.基于长程时间依赖的主动学习在手术流程识别中的应用
Int J Comput Assist Radiol Surg. 2020 Sep;15(9):1573-1584. doi: 10.1007/s11548-020-02198-9. Epub 2020 Jun 25.
9
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
10
Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding.跨视图运动一致的自我监督视频内-外对比动作表示理解。
Neural Netw. 2024 Nov;179:106578. doi: 10.1016/j.neunet.2024.106578. Epub 2024 Jul 26.

引用本文的文献

1
Untangling surgical gesture analysis-are we even speaking the same language? a systematic review.解析手术手势分析——我们说的是同一种语言吗?一项系统综述。
Surg Endosc. 2025 Sep;39(9):5538-5557. doi: 10.1007/s00464-025-11907-x. Epub 2025 Jul 31.

本文引用的文献

1
Gesture Recognition in Robotic Surgery With Multimodal Attention.机器人手术中的多模态注意力手势识别。
IEEE Trans Med Imaging. 2022 Jul;41(7):1677-1687. doi: 10.1109/TMI.2022.3147640. Epub 2022 Jun 30.
2
MIcro-surgical anastomose workflow recognition challenge report.显微外科吻合术工作流程识别挑战赛报告。
Comput Methods Programs Biomed. 2021 Nov;212:106452. doi: 10.1016/j.cmpb.2021.106452. Epub 2021 Oct 10.
3
Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition.
基于渐进式未标记数据挖掘的半监督学习在标签高效手术流程识别中的应用。
Med Image Anal. 2021 Oct;73:102158. doi: 10.1016/j.media.2021.102158. Epub 2021 Jul 8.
4
Cross-modal self-supervised representation learning for gesture and skill recognition in robotic surgery.机器人手术中手势和技能识别的跨模态自监督表示学习。
Int J Comput Assist Radiol Surg. 2021 May;16(5):779-787. doi: 10.1007/s11548-021-02343-y. Epub 2021 Mar 24.
5
Gesture Recognition in Robotic Surgery: A Review.机器人手术中的手势识别:综述。
IEEE Trans Biomed Eng. 2021 Jun;68(6):2021-2035. doi: 10.1109/TBME.2021.3054828. Epub 2021 May 21.
6
Surgical data science for next-generation interventions.面向下一代干预措施的外科数据科学。
Nat Biomed Eng. 2017 Sep;1(9):691-696. doi: 10.1038/s41551-017-0132-7.
7
A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.机器人手术中手势分割与识别的数据集及基准
IEEE Trans Biomed Eng. 2017 Sep;64(9):2025-2041. doi: 10.1109/TBME.2016.2647680. Epub 2017 Jan 4.
8
Online time and resource management based on surgical workflow time series analysis.基于手术工作流程时间序列分析的在线时间与资源管理
Int J Comput Assist Radiol Surg. 2017 Feb;12(2):325-338. doi: 10.1007/s11548-016-1474-4. Epub 2016 Aug 29.
9
Training products of experts by minimizing contrastive divergence.通过最小化对比散度来训练专家的产品。
Neural Comput. 2002 Aug;14(8):1771-800. doi: 10.1162/089976602760128018.