• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UNIMEMnet:使用统一记忆网络学习用于视频预测的长期运动和外观动态。

UNIMEMnet: Learning long-term motion and appearance dynamics for video prediction with a unified memory network.

作者信息

Dai Kuai, Li Xutao, Luo Chuyao, Chen Wuqiao, Ye Yunming, Feng Shanshan

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, Guangdong, China.

出版信息

Neural Netw. 2023 Nov;168:256-271. doi: 10.1016/j.neunet.2023.09.024. Epub 2023 Sep 21.

DOI:10.1016/j.neunet.2023.09.024
PMID:37774512
Abstract

As a pixel-wise dense forecast task, video prediction is challenging due to its high computation complexity, dramatic future uncertainty, and extremely complicated spatial-temporal patterns. Many deep learning methods are proposed for the task, which bring up significant improvements. However, they focus on modeling short-term spatial-temporal dynamics and fail to sufficiently exploit long-term ones. As a result, the methods tend to deliver unsatisfactory performance for a long-term forecast requirement. In this article, we propose a novel unified memory network (UNIMEMnet) for long-term video prediction, which can effectively exploit long-term motion-appearance dynamics and unify the short-term spatial-temporal dynamics and long-term ones in an architecture. In the UNIMEMnet, a dual branch multi-scale memory module is carefully designed to extract and preserve long-term spatial-temporal patterns. In addition, a short-term spatial-temporal dynamics module and an alignment and fusion module are devised to capture and coordinate short-term motion-appearance dynamics with long-term ones from our designed memory module. Extensive experiments on five video prediction datasets from both synthetic and real-world scenarios are conducted, which validate the effectiveness and superiority of our proposed method UNIMEMnet over state-of-the-art methods.

摘要

作为一项逐像素的密集预测任务,视频预测具有挑战性,因为其计算复杂度高、未来不确定性大,且时空模式极其复杂。针对该任务提出了许多深度学习方法,这些方法带来了显著的改进。然而,它们专注于对短期时空动态进行建模,未能充分利用长期动态。因此,对于长期预测需求,这些方法往往表现不佳。在本文中,我们提出了一种用于长期视频预测的新型统一记忆网络(UNIMEMnet),它可以有效地利用长期运动外观动态,并在一个架构中统一短期时空动态和长期动态。在UNIMEMnet中,精心设计了一个双分支多尺度记忆模块来提取和保留长期时空模式。此外,还设计了一个短期时空动态模块以及一个对齐与融合模块,以捕捉短期运动外观动态并将其与我们设计的记忆模块中的长期动态进行协调。我们在来自合成和现实世界场景的五个视频预测数据集上进行了广泛的实验,验证了我们提出的方法UNIMEMnet相对于现有方法的有效性和优越性。

相似文献

1
UNIMEMnet: Learning long-term motion and appearance dynamics for video prediction with a unified memory network.UNIMEMnet:使用统一记忆网络学习用于视频预测的长期运动和外观动态。
Neural Netw. 2023 Nov;168:256-271. doi: 10.1016/j.neunet.2023.09.024. Epub 2023 Sep 21.
2
Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach.学习视频超分辨率的时间动态:一种深度学习方法。
IEEE Trans Image Process. 2018 Mar 30. doi: 10.1109/TIP.2018.2820807.
3
Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising.用于轻量级视频去噪的多尺度时空记忆网络
IEEE Trans Image Process. 2024;33:5810-5823. doi: 10.1109/TIP.2024.3444315. Epub 2024 Oct 15.
4
Integrated Multiscale Appearance Features and Motion Information Prediction Network for Anomaly Detection.基于多尺度外观特征与运动信息预测网络的异常检测。
Comput Intell Neurosci. 2021 Oct 20;2021:6789956. doi: 10.1155/2021/6789956. eCollection 2021.
5
Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.用于视频动作识别的递归时空注意网络。
IEEE Trans Image Process. 2018 Mar;27(3):1347-1360. doi: 10.1109/TIP.2017.2778563. Epub 2017 Nov 29.
6
Spatio-temporal prediction and reconstruction network for video anomaly detection.用于视频异常检测的时空预测与重建网络。
PLoS One. 2022 May 26;17(5):e0265564. doi: 10.1371/journal.pone.0265564. eCollection 2022.
7
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.用于无监督视频对象分割的运动和时间线索学习
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.
8
Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory.使用双向时间卷积和双向长短期记忆的集成深度学习模型用于蛋白质二级结构预测。
Front Bioeng Biotechnol. 2023 Feb 13;11:1051268. doi: 10.3389/fbioe.2023.1051268. eCollection 2023.
9
Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer.基于时空协同注意力Transformer的无监督低光照视频增强
IEEE Trans Image Process. 2023;32:4701-4715. doi: 10.1109/TIP.2023.3301332. Epub 2023 Aug 16.
10
DAFA-BiLSTM: Deep Autoregression Feature Augmented Bidirectional LSTM network for time series prediction.DAFA-BiLSTM:用于时间序列预测的深度自回归特征增强双向 LSTM 网络。
Neural Netw. 2023 Jan;157:240-256. doi: 10.1016/j.neunet.2022.10.009. Epub 2022 Oct 14.

引用本文的文献

1
Four-phase CT lesion recognition based on multi-phase information fusion framework and spatiotemporal prediction module.基于多期信息融合框架和时空预测模块的四期 CT 病变识别。
Biomed Eng Online. 2024 Oct 21;23(1):103. doi: 10.1186/s12938-024-01297-x.