Altair Robotics Lab, University of Verona, 37134, Verona, Italy.
ICube, University of Strasbourg, CNRS, 67000, Strasbourg, France.
Int J Comput Assist Radiol Surg. 2023 Sep;18(9):1665-1672. doi: 10.1007/s11548-023-02864-8. Epub 2023 Mar 22.
Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities.
This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN.
The effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations.
This work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
从术中手术视频中自动识别手术活动对于开发计算机辅助干预的智能支持系统至关重要。当前最先进的识别方法基于深度学习,其中数据增强已显示出提高这些方法泛化能力的潜力。这促使人们研究用于图像分类和对象检测的数据集的自动化和简化增强策略。将此类增强方法扩展到视频并不简单,因为需要考虑时间维度。此外,由于手术视频由多个相互关联且持续时间长的活动组成,因此它们带来了额外的挑战。
这项工作提出了一种新的简化增强方法,称为 TRandAugment,专门针对长时间的手术视频设计,它将每个视频视为时间片段的集合,并对每个片段应用一致但随机的变换。所提出的增强方法用于训练一个端到端的时空模型,该模型由一个 CNN(ResNet50)和一个 TCN 组成。
所提出的方法在两个手术视频数据集,即 Bypass40 和 CATARACTS 以及两个任务,即手术阶段和步骤识别上的有效性得到了证明。与使用手动设计的增强方法的先前最先进的方法相比,TRandAugment 提高了 1-6%的性能。
本文提出了一种用于长时间手术视频的简化自动化增强方法。该方法已在不同的数据集和任务上进行了验证,表明为长时间手术视频设计时间增强方法的重要性。