基于序分类的序列多尺度时空特征提取与深度估计。

Multi-Scale Spatio-Temporal Feature Extraction and Depth Estimation from Sequences by Ordinal Classification.

机构信息

School of Digital Media & Design Arts, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Beijing Key Laboratory of Network System and Network Culture, Beijing, 100876, China.

出版信息

Sensors (Basel). 2020 Apr 1;20(7):1979. doi: 10.3390/s20071979.

DOI:10.3390/s20071979

PMID:32244820

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7180695/

Abstract

Depth estimation is a key problem in 3D computer vision and has a wide variety of applications. In this paper we explore whether deep learning network can predict depth map accurately by learning multi-scale spatio-temporal features from sequences and recasting the depth estimation from a regression task to an ordinal classification task. We design an encoder-decoder network with several multi-scale strategies to improve its performance and extract spatio-temporal features with ConvLSTM. The results of our experiments show that the proposed method has an improvement of almost 10% in error metrics and up to 2% in accuracy metrics. The results also tell us that extracting spatio-temporal features can dramatically improve the performance in depth estimation task. We consider to extend this work to a self-supervised manner to get rid of the dependence on large-scale labeled data.

摘要

深度估计是三维计算机视觉中的一个关键问题，具有广泛的应用。在本文中，我们探讨了深度学习网络是否可以通过从序列中学习多尺度时空特征并将深度估计从回归任务重新定义为有序分类任务，从而准确地预测深度图。我们设计了一个具有多种多尺度策略的编码器-解码器网络，以提高其性能并使用 ConvLSTM 提取时空特征。实验结果表明，所提出的方法在误差指标上提高了近 10%，在精度指标上提高了 2%。结果还表明，提取时空特征可以显著提高深度估计任务的性能。我们考虑将这项工作扩展到自监督的方式，以摆脱对大规模标记数据的依赖。