基于视频的三维卷积神经网络手术技能评估。
Video-based surgical skill assessment using 3D convolutional neural networks.
机构信息
Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany.
Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TU Dresden, Dresden, Germany.
出版信息
Int J Comput Assist Radiol Surg. 2019 Jul;14(7):1217-1225. doi: 10.1007/s11548-019-01995-1. Epub 2019 May 18.
PURPOSE
A profound education of novice surgeons is crucial to ensure that surgical interventions are effective and safe. One important aspect is the teaching of technical skills for minimally invasive or robot-assisted procedures. This includes the objective and preferably automatic assessment of surgical skill. Recent studies presented good results for automatic, objective skill evaluation by collecting and analyzing motion data such as trajectories of surgical instruments. However, obtaining the motion data generally requires additional equipment for instrument tracking or the availability of a robotic surgery system to capture kinematic data. In contrast, we investigate a method for automatic, objective skill assessment that requires video data only. This has the advantage that video can be collected effortlessly during minimally invasive and robot-assisted training scenarios.
METHODS
Our method builds on recent advances in deep learning-based video classification. Specifically, we propose to use an inflated 3D ConvNet to classify snippets, i.e., stacks of a few consecutive frames, extracted from surgical video. The network is extended into a temporal segment network during training.
RESULTS
We evaluate the method on the publicly available JIGSAWS dataset, which consists of recordings of basic robot-assisted surgery tasks performed on a dry lab bench-top model. Our approach achieves high skill classification accuracies ranging from 95.1 to 100.0%.
CONCLUSIONS
Our results demonstrate the feasibility of deep learning-based assessment of technical skill from surgical video. Notably, the 3D ConvNet is able to learn meaningful patterns directly from the data, alleviating the need for manual feature engineering. Further evaluation will require more annotated data for training and testing.
目的
对新手外科医生进行深入的教育对于确保手术干预的有效性和安全性至关重要。其中一个重要方面是教授微创或机器人辅助手术的技术技能。这包括客观的,最好是自动的手术技能评估。最近的研究通过收集和分析手术器械的轨迹等运动数据,为自动、客观的技能评估提供了很好的结果。然而,获得运动数据通常需要额外的器械跟踪设备或具有捕获运动学数据的机器人手术系统。相比之下,我们研究了一种仅需要视频数据的自动、客观的技能评估方法。这具有视频可以在微创和机器人辅助训练场景中轻松收集的优点。
方法
我们的方法基于基于深度学习的视频分类的最新进展。具体来说,我们建议使用膨胀的 3D ConvNet 对从手术视频中提取的短片段(即几个连续帧的堆栈)进行分类。在训练过程中,该网络扩展为时间分段网络。
结果
我们在公开的 JIGSAWS 数据集上评估了该方法,该数据集包含在干实验室台式模型上执行的基本机器人辅助手术任务的记录。我们的方法实现了高达 95.1%到 100.0%的高精度技能分类。
结论
我们的结果表明,从手术视频中进行基于深度学习的技能评估是可行的。值得注意的是,3D ConvNet 能够直接从数据中学习有意义的模式,减轻了对手动特征工程的需求。进一步的评估需要更多的标注数据进行训练和测试。