IEEE Trans Pattern Anal Mach Intell. 2016 Jan;38(1):14-29. doi: 10.1109/TPAMI.2015.2430335.
An important aspect of human perception is anticipation, which we use extensively in our day-to-day activities when interacting with other humans as well as with our surroundings. Anticipating which activities will a human do next (and how) can enable an assistive robot to plan ahead for reactive responses. Furthermore, anticipation can even improve the detection accuracy of past activities. The challenge, however, is two-fold: We need to capture the rich context for modeling the activities and object affordances, and we need to anticipate the distribution over a large space of future human activities. In this work, we represent each possible future using an anticipatory temporal conditional random field (ATCRF) that models the rich spatial-temporal relations through object affordances. We then consider each ATCRF as a particle and represent the distribution over the potential futures using a set of particles. In extensive evaluation on CAD-120 human activity RGB-D dataset, we first show that anticipation improves the state-of-the-art detection results. We then show that for new subjects (not seen in the training set), we obtain an activity anticipation accuracy (defined as whether one of top three predictions actually happened) of 84.1, 74.4 and 62.2 percent for an anticipation time of 1, 3 and 10 seconds respectively. Finally, we also show a robot using our algorithm for performing a few reactive responses.
人类感知的一个重要方面是预测,我们在与他人以及周围环境互动时会广泛地运用这种能力。预测人类接下来会进行哪些(以及如何进行)活动,可以使辅助机器人提前计划做出反应。此外,预测甚至可以提高对过去活动的检测准确性。然而,这面临着两个挑战:我们需要捕捉丰富的上下文来建模活动和对象的可及性,我们需要预测未来人类活动在大空间中的分布。在这项工作中,我们使用预期的时间条件随机场(ATCRF)来表示每个可能的未来,该模型通过对象可及性来建模丰富的时空关系。然后,我们将每个 ATCRF 视为一个粒子,并使用一组粒子来表示潜在未来的分布。在对 CAD-120 人类活动 RGB-D 数据集进行的广泛评估中,我们首先表明,预测可以提高现有检测结果的性能。然后,我们展示了对于新的对象(未在训练集中看到),我们可以在 1、3 和 10 秒的预测时间内分别获得 84.1%、74.4%和 62.2%的活动预测准确率(定义为三个预测中是否有一个实际发生)。最后,我们还展示了机器人使用我们的算法执行一些反应性响应的情况。