• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

模式循环神经网络:在无监督预测学习中利用时空模式坍缩

ModeRNN: Harnessing Spatiotemporal Mode Collapse in Unsupervised Predictive Learning.

作者信息

Yao Zhiyu, Wang Yunbo, Wu Haixu, Wang Jianmin, Long Mingsheng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13281-13296. doi: 10.1109/TPAMI.2023.3293145. Epub 2023 Oct 3.

DOI:10.1109/TPAMI.2023.3293145
PMID:37428670
Abstract

Learning predictive models for unlabeled spatiotemporal data is challenging in part because visual dynamics can be highly entangled, especially in real scenes. In this paper, we refer to the multi-modal output distribution of predictive learning as spatiotemporal modes. We find an experimental phenomenon named spatiotemporal mode collapse (STMC) on most existing video prediction models, that is, features collapse into invalid representation subspaces due to the ambiguous understanding of mixed physical processes. We propose to quantify STMC and explore its solution for the first time in the context of unsupervised predictive learning. To this end, we present ModeRNN, a decoupling-aggregation framework that has a strong inductive bias of discovering the compositional structures of spatiotemporal modes between recurrent states. We first leverage a set of dynamic slots with independent parameters to extract individual building components of spatiotemporal modes. We then perform a weighted fusion of slot features to adaptively aggregate them into a unified hidden representation for recurrent updates. Through a series of experiments, we show high correlation between STMC and the fuzzy prediction results of future video frames. Besides, ModeRNN is shown to better mitigate STMC and achieve the state of the art on five video prediction datasets.

摘要

学习未标记的时空数据的预测模型具有挑战性,部分原因是视觉动态可能高度纠缠,尤其是在真实场景中。在本文中,我们将预测学习的多模态输出分布称为时空模式。我们在大多数现有的视频预测模型上发现了一种名为时空模式崩溃(STMC)的实验现象,即由于对混合物理过程的模糊理解,特征会崩溃到无效的表示子空间中。我们首次提出在无监督预测学习的背景下量化STMC并探索其解决方案。为此,我们提出了ModeRNN,这是一个解耦聚合框架,在循环状态之间发现时空模式的组成结构方面具有很强的归纳偏差。我们首先利用一组具有独立参数的动态插槽来提取时空模式的各个构建组件。然后,我们对插槽特征进行加权融合,以自适应地将它们聚合为一个统一的隐藏表示,用于循环更新。通过一系列实验,我们展示了STMC与未来视频帧的模糊预测结果之间的高度相关性。此外,ModeRNN被证明能更好地缓解STMC,并在五个视频预测数据集上达到了当前最优水平。

相似文献

1
ModeRNN: Harnessing Spatiotemporal Mode Collapse in Unsupervised Predictive Learning.模式循环神经网络:在无监督预测学习中利用时空模式坍缩
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):13281-13296. doi: 10.1109/TPAMI.2023.3293145. Epub 2023 Oct 3.
2
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning.PredRNN:一种用于时空预测学习的递归神经网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2208-2225. doi: 10.1109/TPAMI.2022.3165153. Epub 2023 Jan 6.
3
Deep Learning Driven Visual Path Prediction From a Single Image.基于单张图像的深度学习驱动视觉路径预测
IEEE Trans Image Process. 2016 Dec;25(12):5892-5904. doi: 10.1109/TIP.2016.2613686. Epub 2016 Sep 26.
4
STMP-Net: A Spatiotemporal Prediction Network Integrating Motion Perception.STMP-Net:一种集成运动感知的时空预测网络。
Sensors (Basel). 2023 May 28;23(11):5133. doi: 10.3390/s23115133.
5
Disentangling Stochastic PDE Dynamics for Unsupervised Video Prediction.用于无监督视频预测的随机偏微分方程动力学解析
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15427-15441. doi: 10.1109/TNNLS.2023.3286890. Epub 2024 Oct 29.
6
Is Single Enough? A Joint Spatiotemporal Feature Learning Framework for Multivariate Time Series Prediction.单一就足够了吗?一种用于多变量时间序列预测的联合时空特征学习框架。
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4985-4998. doi: 10.1109/TNNLS.2022.3216107. Epub 2024 Apr 4.
7
Learning Constrained Dynamic Correlations in Spatiotemporal Graphs for Motion Prediction.用于运动预测的时空图中学习约束动态相关性
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14273-14287. doi: 10.1109/TNNLS.2023.3277476. Epub 2024 Oct 7.
8
TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.TCGL:用于自监督视频表征学习的时间对比图
IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.
9
A Two-Layer Self-Organizing Map with Vector Symbolic Architecture for Spatiotemporal Sequence Learning and Prediction.一种用于时空序列学习与预测的具有向量符号架构的双层自组织映射。
Biomimetics (Basel). 2024 Mar 13;9(3):175. doi: 10.3390/biomimetics9030175.
10
3-D Deconvolutional Networks for the Unsupervised Representation Learning of Human Motions.用于人体运动无监督表示学习的三维反卷积网络。
IEEE Trans Cybern. 2022 Jan;52(1):398-410. doi: 10.1109/TCYB.2020.2973300. Epub 2022 Jan 11.