IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8494-8506. doi: 10.1109/TPAMI.2022.3232797.
Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis. Although there have been advances with deep learning, it remains challenging. The object recognition-like solutions usually try to map pixels to semantics directly, but activity patterns are much different from object patterns, thus hindering another success. In this article, we propose a novel paradigm to reformulate this task in two-stage: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics. To afford a representative primitive space, we build a knowledge base including 26+ M primitive labels and logic rules from human priors or automatic discovering. Our framework, Human Activity Knowledge Engine (HAKE), exhibits superior generalization ability and performance upon canonical methods on challenging benchmarks. Code and data are available at http://hake-mvig.cn/.
人类活动理解在人工智能中具有广泛的兴趣,涵盖了医疗保健和行为分析等多种应用。尽管深度学习已经取得了进展,但它仍然具有挑战性。类似于对象识别的解决方案通常试图直接将像素映射到语义,但活动模式与对象模式有很大的不同,因此阻碍了另一个成功。在本文中,我们提出了一种新的范例,将这个任务分为两个阶段进行重新表述:首先将像素映射到由原子活动基元构成的中间空间,然后用可解释的逻辑规则对检测到的基元进行编程,以推断语义。为了提供一个有代表性的基元空间,我们构建了一个知识库,其中包含 26 个以上的基元标签和逻辑规则,这些规则来自于人类的先验知识或自动发现。我们的框架,即人类活动知识引擎(HAKE),在具有挑战性的基准上展示了优于典型方法的卓越泛化能力和性能。代码和数据可在 http://hake-mvig.cn/ 上获取。