IEEE Trans Med Imaging. 2021 Jul;40(7):1911-1923. doi: 10.1109/TMI.2021.3069471. Epub 2021 Jun 30.
Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).
自动手术流程识别是开发手术室内上下文感知计算机辅助系统的关键组成部分。以前的工作要么联合建模空间特征和短期固定范围的时间信息,要么分别学习视觉和长期时间线索。在本文中,我们提出了一种新颖的端到端时间记忆关系网络(TMRNet),用于将远程和多尺度时间模式相关联,以增强当前的特征。我们建立了一个远程记忆库,作为存储丰富支持信息的存储单元。通过我们设计的时间变化层,通过多尺度时间卷积进一步增强支持线索。为了在不干扰时空特征联合学习的情况下有效地合并这两种类型的线索,我们引入了一个非局部银行算子来关注过去与现在的关系。在这方面,我们的 TMRNet 使当前特征能够查看远程时间依赖关系,并容忍复杂的时间范围。我们在两个基准手术视频数据集,M2CAI 挑战赛数据集和 Cholec80 数据集上广泛验证了我们的方法。实验结果表明,我们的方法表现出色,始终比最先进的方法有很大的优势(例如,Cholec80 数据集上的 Jaccard 为 67.0%,而 78.9%)。