Ahmidi Narges, Tao Lingling, Sefati Shahin, Gao Yixin, Lea Colin, Haro Benjamin Bejar, Zappella Luca, Khudanpur Sanjeev, Vidal Rene, Hager Gregory D
IEEE Trans Biomed Eng. 2017 Sep;64(9):2025-2041. doi: 10.1109/TBME.2016.2647680. Epub 2017 Jan 4.
State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging.
In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems.
Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings.
Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons.
The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.
用于手术数据分析的先进技术在自动技能评估和动作识别方面报告了有前景的结果。然而,这些技术中的许多贡献仅限于特定研究的数据和验证指标,这使得评估该领域的进展极具挑战性。
在本文中,我们解决手术数据分析的两个主要问题:第一,缺乏统一共享的数据集和基准;第二,缺乏一致的验证过程。我们通过展示约翰霍普金斯大学 - 信息科学研究所手势与技能评估工作集(JIGSAWS)来解决前者,这是一个我们创建的公共数据集,用于支持比较研究基准测试。JIGSAWS包含来自不同技能水平操作者多次机器人手术任务执行的同步视频和运动学数据。我们通过展示一种记录详细的评估方法并报告六种用于JIGSAWS上时间序列数据自动分割和分类技术的结果来解决后者。这些技术包括四种用于联合分割和分类的时间方法:隐马尔可夫模型、稀疏隐马尔可夫模型(HMM)、马尔可夫半马尔可夫条件随机场和跳跃链条件随机场;以及两种基于特征的方法,旨在对固定段进行分类:时空特征袋和线性动态系统。
在留一超级试验和留一用户交叉验证设置下,大多数方法识别手势活动的总体准确率约为80%。
当前方法在这个共享数据集上显示出有前景的结果,但仍有显著进展的空间,特别是在跨不同外科医生一致预测手势活动方面。
本文报告的结果提供了在基准数据库上对外科手术活动识别技术的首次系统和统一评估。