Zeng Zhiliang, Wu Mengyang, Zeng Wei, Fu Chi-Wing
IEEE Trans Image Process. 2020 Apr 15. doi: 10.1109/TIP.2020.2986894.
This paper presents a new approach to recognizing vanishing-point-constrained building planes from a single image of street view. We first design a novel convolutional neural network (CNN) architecture that generates geometric segmentation of per-pixel orientations from a single street-view image. The network combines two-stream features of general visual cues and surface normals in gated convolution layers, and employs a deeply supervised loss that encapsulates multi-scale convolutional features. Our experiments on a new benchmark with fine-grained plane segmentations of real-world street views show that our network outperforms state-of-the-arts methods of both semantic and geometric segmentation. The pixel-wise segmentation exhibits coarse boundaries and discontinuities. We then propose to rectify the pixel-wise segmentation into perspectively-projected quads based on spatial proximity between the segmentation masks and exterior line segments detected through an image processing. We demonstrate how the results can be utilized to perspectively overlay images and icons on building planes in input photos, and provide visual cues for various applications.
本文提出了一种从单张街景图像中识别灭点约束建筑平面的新方法。我们首先设计了一种新颖的卷积神经网络(CNN)架构,该架构从单张街景图像生成每个像素方向的几何分割。该网络在门控卷积层中结合了一般视觉线索和表面法线的双流特征,并采用了一种深度监督损失,该损失封装了多尺度卷积特征。我们在一个具有真实世界街景细粒度平面分割的新基准上进行的实验表明,我们的网络优于语义分割和几何分割的现有方法。逐像素分割呈现出粗糙的边界和不连续性。然后,我们建议基于分割掩码与通过图像处理检测到的外部线段之间的空间接近度,将逐像素分割校正为透视投影四边形。我们展示了如何将结果用于在输入照片中的建筑平面上透视叠加图像和图标,并为各种应用提供视觉线索。