IEEE Trans Image Process. 2022;31:4502-4514. doi: 10.1109/TIP.2022.3181486. Epub 2022 Jul 4.
Most existing methods of human parsing still face a challenge: how to extract the accurate foreground from similar or cluttered scenes effectively. In this paper, we propose a Grammar-induced Wavelet Network (GWNet), to deal with the challenge. GWNet mainly consists of two modules, including a blended grammar-induced module and a wavelet prediction module. We design the blended grammar-induced module to exploit the relationship of different human parts and the inherent hierarchical structure of a human body by means of grammar rules in both cascaded and paralleled manner. In this way, conspicuous parts, which are easily distinguished from the background, can amend the segmentation of inconspicuous ones, improving the foreground extraction. We also design a Part-aware Convolutional Recurrent Neural Network (PCRNN) to pass messages which are generated by grammar rules. To further improve the performance, we propose a wavelet prediction module to capture the basic structure and the edge details of a person by decomposing the low-frequency and high-frequency components of features. The low-frequency component can represent the smooth structures and the high-frequency components can describe the fine details. We conduct extensive experiments to evaluate GWNet on PASCAL-Person-Part, LIP, and PPSS datasets. GWNet obtains state-of-the-art performance on these human parsing datasets.
如何有效地从相似或杂乱的场景中提取准确的前景。在本文中,我们提出了一种基于语法的小波网络(GWNet)来应对这一挑战。GWNet 主要由两个模块组成,包括混合语法引导模块和小波预测模块。我们设计了混合语法引导模块,通过级联和并行的方式利用语法规则来挖掘不同人体部位之间的关系和人体的固有层次结构。通过这种方式,明显的部分可以修正不明显部分的分割,从而提高前景提取的准确性。我们还设计了一个基于部分感知的卷积递归神经网络(PCRNN)来传递由语法规则生成的消息。为了进一步提高性能,我们提出了一个小波预测模块,通过分解特征的低频和高频分量来捕捉人的基本结构和边缘细节。低频分量可以表示平滑的结构,高频分量可以描述精细的细节。我们在 PASCAL-Person-Part、LIP 和 PPSS 数据集上对 GWNet 进行了广泛的实验评估。GWNet 在这些人体解析数据集上取得了最先进的性能。