Google Inc., Mountain View, CA 94043, USA.
IEEE Trans Pattern Anal Mach Intell. 2013 Jul;35(7):1704-16. doi: 10.1109/TPAMI.2012.242.
Tracking and identifying players in sports videos filmed with a single pan-tilt-zoom camera has many applications, but it is also a challenging problem. This paper introduces a system that tackles this difficult task. The system possesses the ability to detect and track multiple players, estimates the homography between video frames and the court, and identifies the players. The identification system combines three weak visual cues, and exploits both temporal and mutual exclusion constraints in a Conditional Random Field (CRF). In addition, we propose a novel Linear Programming (LP) Relaxation algorithm for predicting the best player identification in a video clip. In order to reduce the number of labeled training data required to learn the identification system, we make use of weakly supervised learning with the assistance of play-by-play texts. Experiments show promising results in tracking, homography estimation, and identification. Moreover, weakly supervised learning with play-by-play texts greatly reduces the number of labeled training examples required. The identification system can achieve similar accuracies by using merely 200 labels in weakly supervised learning, while a strongly supervised approach needs a least 20,000 labels.
使用单台平移-缩放摄像机拍摄的体育视频中的球员跟踪和识别具有许多应用,但这也是一个具有挑战性的问题。本文介绍了一个系统来解决这个难题。该系统能够检测和跟踪多个球员,估计视频帧与球场之间的透视变换,并识别球员。识别系统结合了三个弱视觉线索,并利用条件随机场 (CRF) 中的时间和互斥约束。此外,我们还提出了一种新颖的线性规划 (LP) 松弛算法,用于预测视频片段中最佳的球员识别。为了减少学习识别系统所需的标记训练数据的数量,我们利用比赛文本进行弱监督学习。实验在跟踪、透视变换估计和识别方面取得了有希望的结果。此外,使用比赛文本的弱监督学习大大减少了所需的标记训练示例的数量。识别系统仅在弱监督学习中使用 200 个标签即可实现相似的准确率,而强监督方法至少需要 20,000 个标签。