IEEE Trans Cybern. 2017 May;47(5):1180-1197. doi: 10.1109/TCYB.2016.2539546. Epub 2016 Mar 28.
Multimedia event detection has been one of the major endeavors in video event analysis. A variety of approaches have been proposed recently to tackle this problem. Among others, using semantic representation has been accredited for its promising performance and desirable ability for human-understandable reasoning. To generate semantic representation, we usually utilize several external image/video archives and apply the concept detectors trained on them to the event videos. Due to the intrinsic difference of these archives, the resulted representation is presumable to have different predicting capabilities for a certain event. Notwithstanding, not much work is available for assessing the efficacy of semantic representation from the source-level. On the other hand, it is plausible to perceive that some concepts are noisy for detecting a specific event. Motivated by these two shortcomings, we propose a bi-level semantic representation analyzing method. Regarding source-level, our method learns weights of semantic representation attained from different multimedia archives. Meanwhile, it restrains the negative influence of noisy or irrelevant concepts in the overall concept-level. In addition, we particularly focus on efficient multimedia event detection with few positive examples, which is highly appreciated in the real-world scenario. We perform extensive experiments on the challenging TRECVID MED 2013 and 2014 datasets with encouraging results that validate the efficacy of our proposed approach.
多媒体事件检测一直是视频事件分析的主要工作之一。最近已经提出了各种方法来解决这个问题。其中,使用语义表示因其有希望的性能和可理解的人类推理能力而受到赞誉。为了生成语义表示,我们通常使用几个外部的图像/视频档案,并将在这些档案上训练的概念检测器应用于事件视频。由于这些档案的内在差异,所得到的表示对于某个事件可能具有不同的预测能力。然而,很少有工作可用于从源级别评估语义表示的效果。另一方面,可以合理地认为某些概念对于检测特定事件来说是嘈杂的。受这两个缺点的启发,我们提出了一种两级语义表示分析方法。在源级别,我们的方法学习从不同多媒体档案获得的语义表示的权重。同时,它限制了整体概念级别中噪声或不相关概念的负面影响。此外,我们特别关注具有少量正例的高效多媒体事件检测,这在实际场景中是非常需要的。我们在具有挑战性的 TRECVID MED 2013 和 2014 数据集上进行了广泛的实验,结果令人鼓舞,验证了我们提出的方法的有效性。