基于视频数据的手术手势分类

Surgical gesture classification from video data.

作者信息

Haro Benjamín Béjar, Zappella Luca, Vidal René

机构信息

Center for Imaging Science, Johns Hopkins University, USA.

出版信息

Med Image Comput Comput Assist Interv. 2012;15(Pt 1):34-41. doi: 10.1007/978-3-642-33415-3_5.

Abstract

Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.

摘要

现有的许多关于机器人手术中手势和技能自动分类的工作都是基于运动学和动力学线索,如完成时间、速度、力、扭矩或机器人轨迹。在本文中,我们表明,在典型的手术训练设置中,视频数据同样具有区分性。为此,我们提出并评估了三种从视频中进行手术手势分类的方法。在第一种方法中,我们将每个手术手势的每个视频片段建模为线性动态系统(LDS)的输出,并使用LDS空间中的度量来对新的视频片段进行分类。在第二种方法中,我们使用从每个视频片段中提取的时空特征来学习时空单词字典,并使用特征袋(BoF)方法对新的视频片段进行分类。在第三种方法中,我们使用多核学习来结合LDS和BoF方法。我们的实验表明,基于视频数据的方法与基于运动学数据的最先进方法表现相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索