IEEE Trans Image Process. 2022;31:1830-1840. doi: 10.1109/TIP.2021.3139178. Epub 2022 Feb 16.
Identifying the same persons across different views plays an important role in many vision applications. In this paper, we study this important problem, denoted as Multi-view Multi-Human Association (MvMHA), on multi-view images that are taken by different cameras at the same time. Different from previous works on human association across two views, this paper is focused on more general and challenging scenarios of more than two views, and none of these views are fixed or priorly known. In addition, each involved person may be present in all the views or only a subset of views, which are also not priorly known. We develop a new end-to-end deep-network based framework to address this problem. First, we use an appearance-based deep network to extract the feature of each detected subject on each image. We then compute pairwise-similarity scores between all the detected subjects and construct a comprehensive affinity matrix. Finally, we propose a Deep Assignment Network (DAN) to transform the affinity matrix into an assignment matrix, which provides a binary assignment result for MvMHA. We build both a synthetic dataset and a real image dataset to verify the effectiveness of the proposed method. We also test the trained network on other three public datasets, resulting in very good cross-domain performance.
在许多视觉应用中,跨不同视图识别相同的人起着重要的作用。在本文中,我们研究了在多视图图像上的这个重要问题,这些图像是由不同的相机在同一时间拍摄的。与之前关于跨两个视图的人类关联的工作不同,本文侧重于更一般和更具挑战性的多视图场景,并且这些视图都不是固定的或预先已知的。此外,每个参与的人可能出现在所有视图中,也可能只出现在视图的子集,这也是预先未知的。我们开发了一个新的基于端到端深度网络的框架来解决这个问题。首先,我们使用基于外观的深度网络来提取每个图像中每个检测到的对象的特征。然后,我们计算所有检测到的对象之间的成对相似度得分,并构建一个综合的相似性矩阵。最后,我们提出了一个深度分配网络(DAN),将相似性矩阵转换为分配矩阵,为多视图多人类关联提供了一个二进制的分配结果。我们构建了一个合成数据集和一个真实图像数据集来验证所提出方法的有效性。我们还在其他三个公共数据集上测试了训练好的网络,结果表现出非常好的跨域性能。