Department of Computer Science, ETH Zurich, Zurich, Switzerland.
Division of Gynecology Department OB/GYN, University Hospital, Zurich, Switzerland.
Int J Comput Assist Radiol Surg. 2021 Nov;16(11):2037-2044. doi: 10.1007/s11548-021-02493-z. Epub 2021 Sep 20.
Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way.
We use self-supervised training of deep encoder-decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline.
Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder-decoder network on the task of predicting the remaining surgery progress.
Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.
基于虚拟现实的模拟器有可能成为外科教育的重要组成部分。为了充分利用这一潜力,它们必须能够自动识别用户执行的活动并对这些活动进行评估。由于人类专家对轨迹进行注释的成本很高,因此需要能够以数据高效的方式学习识别手术活动的方法。
我们使用深度编解码器架构的自监督训练来从视频数据中学习手术轨迹的表示。这些表示允许半自动提取特征,这些特征捕获轨迹中语义上重要事件的信息。这些特征作为无监督手术活动识别管道的输入进行处理。
我们的实验记录表明,用于识别模拟子宫肌瘤切除术场景中活动的隐式半马尔可夫模型的性能受益于在预测剩余手术进度的任务上训练深度编解码器网络时学习的表示中提取的特征。
我们的工作是朝着在仅由人类领域专家注释的现有数据的一小部分并且这些注释可能不完整的情况下,从深度表示学习中获取的特征高效用于手术活动识别的方向迈出的重要第一步。