Tang Na, Liao Yuehui, Chen Yu, Yang Guang, Lai Xiaobo, Chen Jing
School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou 310053, China.
Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK.
Sensors (Basel). 2025 Feb 20;25(5):1278. doi: 10.3390/s25051278.
Video portrait segmentation is essential for intelligent sensing systems, including human-computer interaction, autonomous navigation, and augmented reality. However, dynamic video environments introduce significant challenges, such as temporal variations, occlusions, and computational constraints. This study introduces RVM+, an enhanced video segmentation framework based on the Robust Video Matting (RVM) architecture. By incorporating Convolutional Gated Recurrent Units (ConvGRU), RVM+ improves temporal consistency and captures intricate temporal dynamics across video frames. Additionally, a novel knowledge distillation strategy reduces computational demands while maintaining high segmentation accuracy, making the framework ideal for real-time applications in resource-constrained environments. Comprehensive evaluations on challenging datasets show that RVM+ outperforms state-of-the-art methods in both segmentation accuracy and temporal consistency. Key performance indicators such as MIoU, SAD, and dtSSD effectively verify the robustness and efficiency of the model. The integration of knowledge distillation ensures a streamlined and effective design with negligible accuracy trade-offs, highlighting its suitability for practical deployment. This study makes significant strides in intelligent sensor technology, providing a high-performance, efficient, and scalable solution for video segmentation. RVM+ offers potential for applications in fields such as augmented reality, robotics, and real-time video analysis, while also advancing the development of AI-enabled vision sensors.
视频人像分割对于智能传感系统至关重要,这些系统包括人机交互、自主导航和增强现实。然而,动态视频环境带来了重大挑战,如时间变化、遮挡和计算限制。本研究引入了RVM+,这是一种基于鲁棒视频抠图(RVM)架构的增强型视频分割框架。通过整合卷积门控循环单元(ConvGRU),RVM+提高了时间一致性,并捕捉了视频帧之间复杂的时间动态。此外,一种新颖的知识蒸馏策略在保持高分割精度的同时降低了计算需求,使该框架非常适合在资源受限环境中的实时应用。在具有挑战性的数据集上的综合评估表明,RVM+在分割精度和时间一致性方面均优于现有方法。诸如MIoU、SAD和dtSSD等关键性能指标有效地验证了模型的稳健性和效率。知识蒸馏的整合确保了设计的精简和有效,且精度折衷可忽略不计,突出了其在实际部署中的适用性。本研究在智能传感器技术方面取得了重大进展,为视频分割提供了一种高性能、高效且可扩展的解决方案。RVM+在增强现实、机器人技术和实时视频分析等领域具有应用潜力,同时也推动了人工智能视觉传感器的发展。