Zhang Weidong, Zhang Wei, Gu Jason
IEEE Trans Cybern. 2020 Jun;50(6):2730-2739. doi: 10.1109/TCYB.2019.2895837. Epub 2019 Feb 21.
Visual cognition of the indoor environment can benefit from the spatial layout estimation, which is to represent an indoor scene with a 2-D box on a monocular image. In this paper, we propose to fully exploit the edge and semantic information of a room image for layout estimation. More specifically, we present an encoder-decoder network with shared encoder and two separate decoders, which are composed of multiple deconvolution (transposed convolution) layers, to jointly learn the edge maps and semantic labels of a room image. We combine these two network predictions in a scoring function to evaluate the quality of the layouts, which are generated by ray sampling and from a predefined layout pool. Guided by the scoring function, we apply a novel refinement strategy to further optimize the layout hypotheses. Experimental results show that the proposed network can yield accurate estimates of edge maps and semantic labels. By fully utilizing the two different types of labels, the proposed method achieves the state-of-the-art layout estimation performance on the benchmark datasets.