IEEE Trans Pattern Anal Mach Intell. 2022 Jul;44(7):3508-3522. doi: 10.1109/TPAMI.2021.3055780. Epub 2022 Jun 3.
Modeling the human structure is central for human parsing that extracts pixel-wise semantic information from images. We start with analyzing three types of inference processes over the hierarchical structure of human bodies: direct inference (directly predicting human semantic parts using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). We then formulate the problem as a compositional neural information fusion (CNIF) framework, which assembles the information from the three inference processes in a conditional manner, i.e., considering the confidence of the sources. Based on CNIF, we further present a part-relation-aware human parser (PRHP), which precisely describes three kinds of human part relations, i.e., decomposition, composition, and dependency, by three distinct relation networks. Expressive relation information can be captured by imposing the parameters in the relation networks to satisfy specific geometric characteristics of different relations. By assimilating generic message-passing networks with their edge-typed, convolutional counterparts, PRHP performs iterative reasoning over the human body hierarchy. With these efforts, PRHP provides a more general and powerful form of CNIF, and lays the foundation for more sophisticated and flexible human relation patterns of reasoning. Experiments on five datasets demonstrate that our two human parsers outperform the state-of-the-arts in all cases.
对人体结构进行建模对于从图像中提取像素级语义信息的人体解析至关重要。我们首先分析了人体层次结构的三种推理过程:直接推理(直接使用图像信息预测人体语义部分)、自底向上推理(从组成部分组装知识)和自顶向下推理(利用父节点的上下文)。然后,我们将问题表述为一个组合式神经信息融合(CNIF)框架,该框架以条件方式组合来自三个推理过程的信息,即考虑来源的置信度。基于 CNIF,我们进一步提出了一种具有部分关系感知的人体解析器(PRHP),该解析器通过三个不同的关系网络精确描述了三种人体部分关系,即分解、组合和依赖关系。通过将关系网络中的参数施加到满足不同关系特定几何特征的约束条件下,可以捕获有表现力的关系信息。通过将通用消息传递网络与它们的边缘类型、卷积对应物融合,PRHP 在人体层次结构上进行迭代推理。通过这些努力,PRHP 提供了一种更通用、更强大的 CNIF 形式,并为更复杂、更灵活的人体关系推理模式奠定了基础。在五个数据集上的实验表明,我们的两个人体解析器在所有情况下都优于最先进的方法。