Patil Prashant W, Dudhane Akshay, Kulkarni Ashutosh, Murala Subrahmanyam, Gonde Anil Balaji, Gupta Sunil
IEEE Trans Image Process. 2021;30:7889-7902. doi: 10.1109/TIP.2021.3108405. Epub 2021 Sep 20.
Moving object segmentation (MOS) in videos received considerable attention because of its broad security-based applications like robotics, outdoor video surveillance, self-driving cars, etc. The current prevailing algorithms highly depend on additional trained modules for other applications or complicated training procedures or neglect the inter-frame spatio-temporal structural dependencies. To address these issues, a simple, robust, and effective unified recurrent edge aggregation approach is proposed for MOS, in which additional trained modules or fine-tuning on a test video frame(s) are not required. Here, a recurrent edge aggregation module (REAM) is proposed to extract effective foreground relevant features capturing spatio-temporal structural dependencies with encoder and respective decoder features connected recurrently from previous frame. These REAM features are then connected to a decoder through skip connections for comprehensive learning named as temporal information propagation. Further, the motion refinement block with multi-scale dense residual is proposed to combine the features from the optical flow encoder stream and the last REAM module for holistic feature learning. Finally, these holistic features and REAM features are given to the decoder block for segmentation. To guide the decoder block, previous frame output with respective scales is utilized. The different configurations of training-testing techniques are examined to evaluate the performance of the proposed method. Specifically, outdoor videos often suffer from constrained visibility due to different environmental conditions and other small particles in the air that scatter the light in the atmosphere. Thus, comprehensive result analysis is conducted on six benchmark video datasets with different surveillance environments. We demonstrate that the proposed method outperforms the state-of-the-art methods for MOS without any pre-trained module, fine-tuning on the test video frame(s) or complicated training.
视频中的运动目标分割(MOS)因其在机器人技术、户外视频监控、自动驾驶汽车等基于安全的广泛应用而受到广泛关注。当前流行的算法高度依赖于用于其他应用的额外训练模块或复杂的训练过程,或者忽略了帧间时空结构依赖性。为了解决这些问题,本文提出了一种简单、稳健且有效的统一递归边缘聚合方法用于MOS,该方法无需额外的训练模块或在测试视频帧上进行微调。在此,提出了一种递归边缘聚合模块(REAM),通过将编码器和各自的解码器特征与前一帧递归连接来提取有效的前景相关特征,从而捕捉时空结构依赖性。然后,这些REAM特征通过跳跃连接连接到解码器,进行全面学习,称为时间信息传播。此外,还提出了具有多尺度密集残差的运动细化模块,将光流编码器流和最后一个REAM模块的特征相结合,进行整体特征学习。最后,将这些整体特征和REAM特征输入到解码器模块进行分割。为了指导解码器模块,利用了具有相应尺度的前一帧输出。研究了不同的训练 - 测试技术配置,以评估所提出方法的性能。具体而言,户外视频由于不同的环境条件和空气中散射光线的其他小颗粒,常常存在能见度受限的问题。因此,在六个具有不同监控环境的基准视频数据集上进行了全面的结果分析。我们证明,所提出的方法在没有任何预训练模块、不在测试视频帧上进行微调或采用复杂训练的情况下,优于当前最先进的MOS方法。