Zhang Shansi, Zhao Yaping, Lam Edmund Y
IEEE Trans Image Process. 2024;33:4516-4528. doi: 10.1109/TIP.2024.3441930. Epub 2024 Aug 23.
Light field (LF) images enable numerous applications due to their ability to capture information for multiple views. Semantic segmentation is an essential task for LF scene understanding. However, existing supervised methods heavily rely on a large number of pixel-wise annotations. To relieve this problem, we propose a semi-supervised LF semantic segmentation method that requires only a small subset of labeled data and harnesses the LF disparity information. First, we design an unsupervised disparity estimation network, which can determine the disparity map for every view. With the estimated disparity maps, we generate pseudo-labels along with their weight maps for the peripheral views when only the labels of central views are available. We then merge the predictions from multiple views to obtain more reliable pseudo-labels for unlabeled data, and introduce a disparity-semantics consistency loss to enforce structure similarity. Moreover, we develop a comprehensive contrastive learning scheme that includes a pixel-level strategy to enhance feature representations and an object-level strategy to improve segmentation for individual objects. Our method demonstrates state-of-the-art performance on the benchmark LF semantic segmentation dataset under a variety of training settings and achieves comparable performance to supervised methods when trained under 1/2 protocol.
光场(LF)图像因其能够捕捉多视角信息而支持众多应用。语义分割是LF场景理解的一项重要任务。然而,现有的监督方法严重依赖大量的逐像素标注。为缓解这一问题,我们提出一种半监督LF语义分割方法,该方法仅需要一小部分标注数据,并利用LF视差信息。首先,我们设计了一个无监督视差估计网络,它可以为每个视角确定视差图。利用估计出的视差图,当只有中心视角的标签可用时,我们为周边视角生成伪标签及其权重图。然后,我们合并多个视角的预测结果,为未标注数据获得更可靠的伪标签,并引入视差-语义一致性损失以增强结构相似性。此外,我们开发了一种全面的对比学习方案,其中包括一个像素级策略以增强特征表示,以及一个对象级策略以改善单个对象的分割。我们的方法在各种训练设置下的基准LF语义分割数据集上展示了领先的性能,并且在1/2协议下训练时,取得了与监督方法相当的性能。