Lin Guangchen, Li Songyuan, Chen Yifeng, Li Xi
IEEE Trans Image Process. 2024;33:1487-1496. doi: 10.1109/TIP.2023.3234499. Epub 2024 Feb 21.
Traditional CNN-based pipelines for panoptic segmentation decompose the task into two subtasks, i.e., instance segmentation and semantic segmentation. In this way, they extract information with multiple branches, perform two subtasks separately and finally fuse the results. However, excessive feature extraction and complicated processes make them time-consuming. We propose IDNet to decompose panoptic segmentation at information level. IDNet only extracts two kinds of information and directly completes panoptic segmentation task, saving the efforts to extract extra information and to fuse subtasks. By decomposing panoptic segmentation into category information and location information and recomposing them with a serial pipeline, the process for panoptic segmentation is simplified greatly and unified with regard to stuff and things. We also adopt two correction losses specially designed for our serial pipeline, guaranteeing the overall predicting performance. As a result, IDNet strikes a better balance between effectiveness and efficiency, achieving the fastest inference speed of 24.2 FPS at a resolution of 800×1333 on a Tesla V100 GPU and a PQ of 43.8, which is comparable in one-stage CNN-based methods. The code will be released at https://github.com/AronLin/IDNet.
传统的基于卷积神经网络(CNN)的全景分割流水线将任务分解为两个子任务,即实例分割和语义分割。通过这种方式,它们利用多个分支提取信息,分别执行两个子任务,最后融合结果。然而,过多的特征提取和复杂的流程使它们耗时较长。我们提出了IDNet,在信息层面分解全景分割任务。IDNet只提取两种信息,并直接完成全景分割任务,省去了提取额外信息和融合子任务的工作。通过将全景分割分解为类别信息和位置信息,并使用串行流水线重新组合它们,全景分割的过程被大大简化,并且在处理“stuff”和“things”方面实现了统一。我们还采用了专门为我们的串行流水线设计的两种校正损失,保证了整体预测性能。结果,IDNet在有效性和效率之间取得了更好的平衡,在Tesla V100 GPU上以800×1333的分辨率实现了24.2 FPS的最快推理速度和43.8的PQ,这与基于CNN的单阶段方法相当。代码将在https://github.com/AronLin/IDNet上发布。