IEEE Trans Med Imaging. 2022 Nov;41(11):3309-3319. doi: 10.1109/TMI.2022.3182995. Epub 2022 Oct 27.
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries. Existing methods ignored a pivotal problem that surgical phases should be classified by learning segment-level semantics instead of solely relying on frame-wise information. This paper presents a segment-attentive hierarchical consistency network (SAHC) for surgical phase recognition from videos. The key idea is to extract hierarchical high-level semantic-consistent segments and use them to refine the erroneous predictions caused by ambiguous frames. To achieve it, we design a temporal hierarchical network to generate hierarchical high-level segments. Then, we introduce a hierarchical segment-frame attention module to capture relations between the low-level frames and high-level segments. By regularizing the predictions of frames and their corresponding segments via a consistency loss, the network can generate semantic-consistent segments and then rectify the misclassified predictions caused by ambiguous low-level frames. We validate SAHC on two public surgical video datasets, i.e., the M2CAI16 challenge dataset and the Cholec80 dataset. Experimental results show that our method outperforms previous state-of-the-arts and ablation studies prove the effectiveness of our proposed modules. Our code has been released at: https://github.com/xmed-lab/SAHC.
自动手术阶段识别在机器人辅助手术中起着至关重要的作用。现有的方法忽略了一个关键问题,即手术阶段应该通过学习段级别的语义进行分类,而不仅仅依赖于逐帧的信息。本文提出了一种用于从视频中识别手术阶段的基于分段注意力的层次一致性网络(SAHC)。其核心思想是提取分层的高级语义一致段,并利用它们来细化由于模糊帧引起的错误预测。为了实现这一点,我们设计了一个时间层次网络来生成分层的高级段。然后,我们引入了一个层次分段-帧注意力模块来捕获低级帧和高级段之间的关系。通过一致性损失来正则化帧及其相应段的预测,网络可以生成语义一致的段,然后纠正由于低级模糊帧引起的错误分类预测。我们在两个公共的手术视频数据集上验证了 SAHC,即 M2CAI16 挑战赛数据集和 Cholec80 数据集。实验结果表明,我们的方法优于以前的最新技术,并且消融研究证明了我们提出的模块的有效性。我们的代码已经发布在:https://github.com/xmed-lab/SAHC。