基于单张图像的深度学习驱动视觉路径预测

Deep Learning Driven Visual Path Prediction From a Single Image.

出版信息

IEEE Trans Image Process. 2016 Dec;25(12):5892-5904. doi: 10.1109/TIP.2016.2613686. Epub 2016 Sep 26.

DOI:10.1109/TIP.2016.2613686

Abstract

Capabilities of inference and prediction are the significant components of visual systems. Visual path prediction is an important and challenging task among them, with the goal to infer the future path of a visual object in a static scene. This task is complicated as it needs high-level semantic understandings of both the scenes and underlying motion patterns in video sequences. In practice, cluttered situations have also raised higher demands on the effectiveness and robustness of models. Motivated by these observations, we propose a deep learning framework, which simultaneously performs deep feature learning for visual representation in conjunction with spatiotemporal context modeling. After that, a unified path-planning scheme is proposed to make accurate path prediction based on the analytic results returned by the deep context models. The highly effective visual representation and deep context models ensure that our framework makes a deep semantic understanding of the scenes and motion patterns, consequently improving the performance on visual path prediction task. In experiments, we extensively evaluate the model's performance by constructing two large benchmark datasets from the adaptation of video tracking datasets. The qualitative and quantitative experimental results show that our approach outperforms the state-of-the-art approaches and owns a better generalization capability.

摘要

推理和预测能力是视觉系统的重要组成部分。视觉路径预测是其中一项重要且具有挑战性的任务，其目标是推断静态场景中视觉对象的未来路径。这项任务很复杂，因为它需要对视频序列中的场景和潜在运动模式有高级语义理解。在实际应用中，杂乱的场景也对模型的有效性和鲁棒性提出了更高要求。受这些观察结果的启发，我们提出了一个深度学习框架，该框架同时结合时空上下文建模进行视觉表示的深度特征学习。在此之后，提出了一种统一的路径规划方案，以基于深度上下文模型返回的分析结果进行准确的路径预测。高效的视觉表示和深度上下文模型确保我们的框架对场景和运动模式有深入的语义理解，从而提高视觉路径预测任务的性能。在实验中，我们通过改编视频跟踪数据集构建了两个大型基准数据集，广泛评估了模型的性能。定性和定量实验结果表明，我们的方法优于现有方法，并且具有更好的泛化能力。

相似文献

Deep Learning Driven Visual Path Prediction From a Single Image.基于单张图像的深度学习驱动视觉路径预测

IEEE Trans Image Process. 2016 Dec;25(12):5892-5904. doi: 10.1109/TIP.2016.2613686. Epub 2016 Sep 26.

Revisiting Video Saliency Prediction in the Deep Learning Era.深度学习时代的视频显著度预测再探讨。

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):220-237. doi: 10.1109/TPAMI.2019.2924417. Epub 2020 Dec 4.

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation.多模态多尺度深度学习在大规模图像标注中的应用。

IEEE Trans Image Process. 2019 Apr;28(4):1720-1731. doi: 10.1109/TIP.2018.2881928. Epub 2018 Nov 16.

Jointly Feature Learning and Selection for Robust Tracking via a Gating Mechanism.通过门控机制进行鲁棒跟踪的联合特征学习与选择

PLoS One. 2016 Aug 30;11(8):e0161808. doi: 10.1371/journal.pone.0161808. eCollection 2016.

A Review on Deep Learning Techniques for Video Prediction.深度学习技术在视频预测中的研究综述

IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):2806-2826. doi: 10.1109/TPAMI.2020.3045007. Epub 2022 May 5.

Variational Structured Attention Networks for Deep Visual Representation Learning.用于深度视觉表征学习的变分结构化注意力网络

IEEE Trans Image Process. 2022 Mar 2;PP. doi: 10.1109/TIP.2021.3137647.

Robust Deep Co-Saliency Detection With Group Semantic and Pyramid Attention.基于组语义和金字塔注意力的鲁棒深度协同显著性检测

IEEE Trans Neural Netw Learn Syst. 2020 Jul;31(7):2398-2408. doi: 10.1109/TNNLS.2020.2967471. Epub 2020 Feb 13.

Feature Distilled Tracking.特征蒸馏追踪。

IEEE Trans Cybern. 2019 Feb;49(2):440-452. doi: 10.1109/TCYB.2017.2776977. Epub 2017 Dec 7.

Optimizing latent graph representations of surgical scenes for unseen domain generalization.优化手术场景的潜在图表示，以实现未见领域泛化。

Int J Comput Assist Radiol Surg. 2024 Jun;19(6):1243-1250. doi: 10.1007/s11548-024-03121-2. Epub 2024 Apr 28.

Learning deep hierarchical visual feature coding.学习深度层次视觉特征编码。

IEEE Trans Neural Netw Learn Syst. 2014 Dec;25(12):2212-25. doi: 10.1109/TNNLS.2014.2307532.

引用本文的文献

Iterative Design and Prototyping of Computer Vision Mediated Remote Sighted Assistance.计算机视觉介导的远程视力辅助的迭代设计与原型制作

ACM Trans Comput Hum Interact. 2022 Aug;29(4). doi: 10.1145/3501298. Epub 2022 Mar 31.

Editorial: Deep Learning for Toxicity and Disease Prediction.社论：用于毒性和疾病预测的深度学习

Front Genet. 2020 Feb 26;11:175. doi: 10.3389/fgene.2020.00175. eCollection 2020.

Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals With High-Throughput Cell-Based Androgen Receptor Bioassay Data.基于深度学习的多类别毒性分类结构-活性关系建模：以10000种具有基于高通量细胞的雄激素受体生物测定数据的Tox21化学物质为例

Front Physiol. 2019 Aug 13;10:1044. doi: 10.3389/fphys.2019.01044. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于单张图像的深度学习驱动视觉路径预测

Deep Learning Driven Visual Path Prediction From a Single Image.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献