Oh Sang-Il, Kang Hang-Bong
Department of Media Engineering, Catholic University of Korea, 43-1, Yeoggok 2-dong, Wonmmi-gu, Bucheon-si, Gyeonggi-do 14662, Korea.
Sensors (Basel). 2017 Apr 18;17(4):883. doi: 10.3390/s17040883.
Multiple-object tracking is affected by various sources of distortion, such as occlusion, illumination variations and motion changes. Overcoming these distortions by tracking on RGB frames, such as shifting, has limitations because of material distortions caused by RGB frames. To overcome these distortions, we propose a multiple-object fusion tracker (MOFT), which uses a combination of 3D point clouds and corresponding RGB frames. The MOFT uses a matching function initialized on large-scale external sequences to determine which candidates in the current frame match with the target object in the previous frame. After conducting tracking on a few frames, the initialized matching function is fine-tuned according to the appearance models of target objects. The fine-tuning process of the matching function is constructed as a structured form with diverse matching function branches. In general multiple object tracking situations, scale variations for a scene occur depending on the distance between the target objects and the sensors. If the target objects in various scales are equally represented with the same strategy, information losses will occur for any representation of the target objects. In this paper, the output map of the convolutional layer obtained from a pre-trained convolutional neural network is used to adaptively represent instances without information loss. In addition, MOFT fuses the tracking results obtained from each modality at the decision level to compensate the tracking failures of each modality using basic belief assignment, rather than fusing modalities by selectively using the features of each modality. Experimental results indicate that the proposed tracker provides state-of-the-art performance considering multiple objects tracking (MOT) and KITTIbenchmarks.
多目标跟踪会受到各种失真源的影响,例如遮挡、光照变化和运动变化。通过在RGB帧上进行跟踪来克服这些失真,比如平移,由于RGB帧导致的材质失真而存在局限性。为了克服这些失真,我们提出了一种多目标融合跟踪器(MOFT),它使用3D点云与相应RGB帧的组合。MOFT使用在大规模外部序列上初始化的匹配函数来确定当前帧中的哪些候选对象与前一帧中的目标对象匹配。在对几帧进行跟踪后,根据目标对象的外观模型对初始化的匹配函数进行微调。匹配函数的微调过程被构建为具有不同匹配函数分支的结构化形式。在一般的多目标跟踪情况下,场景的尺度变化取决于目标对象与传感器之间的距离。如果以相同策略平等地表示各种尺度的目标对象,那么对于目标对象的任何表示都会出现信息丢失。在本文中,使用从预训练卷积神经网络获得的卷积层输出映射来自适应地表示实例而不损失信息。此外,MOFT在决策层面融合从每个模态获得的跟踪结果,以使用基本信念分配来补偿每个模态的跟踪失败,而不是通过选择性地使用每个模态的特征来融合模态。实验结果表明,考虑到多目标跟踪(MOT)和KITTI基准,所提出的跟踪器提供了最先进的性能。