Suppr超能文献

基于视频和运动学数据的外科手势分类。

Surgical gesture classification from video and kinematic data.

机构信息

Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA.

出版信息

Med Image Anal. 2013 Oct;17(7):732-45. doi: 10.1016/j.media.2013.04.007. Epub 2013 Apr 28.

Abstract

Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on dynamic cues (e.g., time to completion, speed, forces, torque) or kinematic data (e.g., robot trajectories and velocities). While videos could be equally or more discriminative (e.g., videos contain semantic information not present in kinematic data), they are typically not used because of the difficulties associated with automatic video interpretation. In this paper, we propose several methods for automatic surgical gesture classification from video data. We assume that the video of a surgical task (e.g., suturing) has been segmented into video clips corresponding to a single gesture (e.g., grabbing the needle, passing the needle) and propose three methods to classify the gesture of each video clip. In the first one, we model each video clip as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words, and use a bag-of-features (BoF) approach to classify new video clips. In the third one, we use multiple kernel learning (MKL) to combine the LDS and BoF approaches. Since the LDS approach is also applicable to kinematic data, we also use MKL to combine both types of data in order to exploit their complementarity. Our experiments on a typical surgical training setup show that methods based on video data perform equally well, if not better, than state-of-the-art approaches based on kinematic data. In turn, the combination of both kinematic and video data outperforms any other algorithm based on one type of data alone.

摘要

现有的许多关于机器人手术中手势和技能的自动分类工作都是基于动态线索(例如,完成时间、速度、力、扭矩)或运动学数据(例如,机器人轨迹和速度)。虽然视频可能同样具有区分性(例如,视频包含运动学数据中不存在的语义信息),但由于自动视频解释所带来的困难,通常不会使用视频。在本文中,我们提出了几种从视频数据中自动分类手术手势的方法。我们假设手术任务(例如缝合)的视频已经被分割成对应单个手势(例如抓取针、传递针)的视频片段,并提出了三种方法来分类每个视频片段的手势。在第一种方法中,我们将每个视频片段建模为线性动力系统(LDS)的输出,并使用 LDS 空间中的度量来对新的视频片段进行分类。在第二种方法中,我们使用从每个视频片段中提取的时空特征来学习时空词字典,并使用特征袋(BoF)方法对新的视频片段进行分类。在第三种方法中,我们使用多核学习(MKL)来结合 LDS 和 BoF 方法。由于 LDS 方法也适用于运动学数据,我们还使用 MKL 来结合这两种类型的数据,以利用它们的互补性。我们在典型的手术培训设置上的实验表明,基于视频数据的方法表现同样出色,如果不是更好的话,比基于运动学数据的最新方法。反过来,运动学和视频数据的结合优于任何其他仅基于一种类型数据的算法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验