Jiang Shenlu, Cui Runze, Wei Runze, Fu Zhiyang, Hong Zhonghua, Feng Guofu
School of Computer Science and Engineering, Macau University of Science and Technology, Macao, Macao SAR, China.
College of Information Technology, Shanghai Ocean University, Shanghai, China.
Front Neurorobot. 2023 Aug 28;17:1255085. doi: 10.3389/fnbot.2023.1255085. eCollection 2023.
Person-following is a crucial capability for service robots, and the employment of vision technology is a leading trend in building environmental understanding. While most existing methodologies rely on a tracking-by-detection strategy, which necessitates extensive datasets for training and yet remains susceptible to environmental noise, we propose a novel approach: real-time tracking-by-segmentation with a future motion estimation framework. This framework facilitates pixel-level tracking of a target individual and predicts their future motion. Our strategy leverages a single-shot segmentation tracking neural network for precise foreground segmentation to track the target, overcoming the limitations of using a rectangular region of interest (ROI). Here we clarify that, while the ROI provides a broad context, the segmentation within this bounding box offers a detailed and more accurate position of the human subject. To further improve our approach, a classification-lock pre-trained layer is utilized to form a constraint that curbs feature outliers originating from the person being tracked. A discriminative correlation filter estimates the potential target region in the scene to prevent foreground misrecognition, while a motion estimation neural network anticipates the target's future motion for use in the control module. We validated our proposed methodology using the VOT, LaSot, YouTube-VOS, and Davis tracking datasets, demonstrating its effectiveness. Notably, our framework supports long-term person-following tasks in indoor environments, showing promise for practical implementation in service robots.
跟随人是服务机器人的一项关键能力,而视觉技术的应用是构建环境理解的主要趋势。虽然现有的大多数方法都依赖于检测跟踪策略,该策略需要大量数据集进行训练,并且仍然容易受到环境噪声的影响,但我们提出了一种新颖的方法:具有未来运动估计框架的实时分割跟踪。该框架有助于对目标个体进行像素级跟踪,并预测其未来运动。我们的策略利用单阶段分割跟踪神经网络进行精确的前景分割以跟踪目标,克服了使用矩形感兴趣区域(ROI)的局限性。在此我们明确指出,虽然ROI提供了广泛的背景,但该边界框内的分割提供了人类主体更详细、更准确的位置。为了进一步改进我们的方法,利用分类锁定预训练层来形成一种约束,抑制来自被跟踪人员的特征异常值。判别相关滤波器估计场景中的潜在目标区域,以防止前景误识别,而运动估计神经网络预测目标的未来运动,以供控制模块使用。我们使用VOT、LaSot、YouTube-VOS和Davis跟踪数据集验证了我们提出的方法,证明了其有效性。值得注意的是,我们的框架支持室内环境中的长期人员跟随任务,显示出在服务机器人中实际应用的前景。