Zhu Linchao, Yang Yi
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):273-285. doi: 10.1109/TPAMI.2020.3007511. Epub 2021 Dec 7.
In this paper, we propose to leverage freely available unlabeled video data to facilitate few-shot video classification. In this semi-supervised few-shot video classification task, millions of unlabeled data are available for each episode during training. These videos can be extremely imbalanced, while they have profound visual and motion dynamics. To tackle the semi-supervised few-shot video classification problem, we make the following contributions. First, we propose a label independent memory (LIM) to cache label related features, which enables a similarity search over a large set of videos. LIM produces a class prototype for few-shot training. This prototype is an aggregated embedding for each class, which is more robust to noisy video features. Second, we integrate a multi-modality compound memory network to capture both RGB and flow information. We propose to store the RGB and flow representation in two separate memory networks, but they are jointly optimized via a unified loss. In this way, mutual communications between the two modalities are leveraged to achieve better classification performance. Third, we conduct extensive experiments on the few-shot Kinetics-100, Something-Something-100 datasets, which validates the effectiveness of leveraging the accessible unlabeled data for few-shot classification.
在本文中,我们提议利用免费可得的未标注视频数据来促进少样本视频分类。在这个半监督少样本视频分类任务中,训练期间每个情节都有数百万未标注数据可用。这些视频可能极度不平衡,同时它们具有深刻的视觉和运动动态。为了解决半监督少样本视频分类问题,我们做出了以下贡献。首先,我们提出了一个标签无关记忆(LIM)来缓存与标签相关的特征,这使得能够在大量视频上进行相似性搜索。LIM为少样本训练生成一个类原型。这个原型是每个类的聚合嵌入,对有噪声的视频特征更具鲁棒性。其次,我们集成了一个多模态复合记忆网络来捕捉RGB和光流信息。我们提议将RGB和光流表示存储在两个单独的记忆网络中,但它们通过统一损失进行联合优化。通过这种方式,利用两种模态之间的相互通信来实现更好的分类性能。第三,我们在少样本Kinetics - 100、Something - Something - 100数据集上进行了广泛实验,这验证了利用可获取的未标注数据进行少样本分类的有效性。