• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越静态图像,以提高多流视觉理解模型中输入方差的弹性。

Going beyond still images to improve input variance resilience in multi-stream vision understanding models.

作者信息

Fadaei Amir Hosein, Dehaqani Mohammad-Reza A

机构信息

College of Engineering, School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.

School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.

出版信息

Sci Rep. 2024 Jul 4;14(1):15366. doi: 10.1038/s41598-024-66346-w.

DOI:10.1038/s41598-024-66346-w
PMID:38965359
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11224316/
Abstract

Traditionally, vision models have predominantly relied on spatial features extracted from static images, deviating from the continuous stream of spatiotemporal features processed by the brain in natural vision. While numerous video-understanding models have emerged, incorporating videos into image-understanding models with spatiotemporal features has been limited. Drawing inspiration from natural vision, which exhibits remarkable resilience to input changes, our research focuses on the development of a brain-inspired model for vision understanding trained with videos. Our findings demonstrate that models that train on videos instead of still images and include temporal features become more resilient to various alternations on input media.

摘要

传统上,视觉模型主要依赖于从静态图像中提取的空间特征,这与大脑在自然视觉中处理的连续时空特征流有所不同。虽然已经出现了许多视频理解模型,但将具有时空特征的视频纳入图像理解模型的情况仍然有限。受自然视觉的启发,自然视觉对输入变化具有显著的适应性,我们的研究重点是开发一种受大脑启发的视觉理解模型,并使用视频进行训练。我们的研究结果表明,与基于静态图像训练且不包含时间特征的模型相比,基于视频训练且包含时间特征的模型对输入媒体的各种变化具有更强的适应性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/e3b80ffaf623/41598_2024_66346_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/ebc9d1411185/41598_2024_66346_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/88ba2eb6928f/41598_2024_66346_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/1e72ca0a22e8/41598_2024_66346_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/1fa69315c5c1/41598_2024_66346_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/a5ac258acd22/41598_2024_66346_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/2484ff3f38db/41598_2024_66346_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/338bd6ddb90d/41598_2024_66346_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/956416201a3f/41598_2024_66346_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/e3b80ffaf623/41598_2024_66346_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/ebc9d1411185/41598_2024_66346_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/88ba2eb6928f/41598_2024_66346_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/1e72ca0a22e8/41598_2024_66346_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/1fa69315c5c1/41598_2024_66346_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/a5ac258acd22/41598_2024_66346_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/2484ff3f38db/41598_2024_66346_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/338bd6ddb90d/41598_2024_66346_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/956416201a3f/41598_2024_66346_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da6c/11224316/e3b80ffaf623/41598_2024_66346_Fig9_HTML.jpg

相似文献

1
Going beyond still images to improve input variance resilience in multi-stream vision understanding models.超越静态图像,以提高多流视觉理解模型中输入方差的弹性。
Sci Rep. 2024 Jul 4;14(1):15366. doi: 10.1038/s41598-024-66346-w.
2
An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition.一种用于动作识别的具有同步时空和空间自注意力的高效视频变换器。
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2496-2509. doi: 10.1109/TNNLS.2022.3190367. Epub 2024 Feb 5.
3
Video Salient Object Detection via Fully Convolutional Networks.基于全卷积网络的视频显著目标检测
IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.
4
MBT: Model-Based Transformer for retinal optical coherence tomography image and video multi-classification.MBT:用于视网膜光学相干断层扫描图像和视频多分类的基于模型的Transformer
Int J Med Inform. 2023 Oct;178:105178. doi: 10.1016/j.ijmedinf.2023.105178. Epub 2023 Aug 21.
5
Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision.深度递归神经网络揭示了动态自然视觉过程记忆的层次结构。
Hum Brain Mapp. 2018 May;39(5):2269-2282. doi: 10.1002/hbm.24006. Epub 2018 Feb 12.
6
What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么:语义信息对视觉表征的影响。
Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.
7
Sound Can Help Us See More Clearly.声音能帮助我们看得更清楚。
Sensors (Basel). 2022 Jan 13;22(2):599. doi: 10.3390/s22020599.
8
Bilinear pooling in video-QA: empirical challenges and motivational drift from neurological parallels.视频问答中的双线性池化:来自神经学相似性的实证挑战与动机漂移
PeerJ Comput Sci. 2022 Jun 3;8:e974. doi: 10.7717/peerj-cs.974. eCollection 2022.
9
Minimal videos: Trade-off between spatial and temporal information in human and machine vision.极简视频:人类与机器视觉中空间与时间信息的权衡
Cognition. 2020 Aug;201:104263. doi: 10.1016/j.cognition.2020.104263. Epub 2020 Apr 20.
10
Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images.基于视网膜光学相干断层扫描图像的老年性黄斑变性检测用缝合视觉Transformer。
PLoS One. 2024 Jun 5;19(6):e0304943. doi: 10.1371/journal.pone.0304943. eCollection 2024.

本文引用的文献

1
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition.NetVLAD:用于弱监督场景识别的卷积神经网络架构。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1437-1451. doi: 10.1109/TPAMI.2017.2711011. Epub 2017 Jun 1.
2
Long-Term Temporal Convolutions for Action Recognition.长期时间卷积用于动作识别。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1510-1517. doi: 10.1109/TPAMI.2017.2712608. Epub 2017 Jun 6.
3
Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition.深度网络在不变目标识别中可模拟人类前馈视觉。
Sci Rep. 2016 Sep 7;6:32672. doi: 10.1038/srep32672.
4
On the usefulness of 'what' and 'where' pathways in vision.论视觉“什么”和“哪里”通路的有用性。
Trends Cogn Sci. 2011 Oct;15(10):460-6. doi: 10.1016/j.tics.2011.08.005. Epub 2011 Sep 7.
5
A backward progression of attentional effects in the ventral stream.腹侧流中注意力效应的逆向进展。
Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):361-5. doi: 10.1073/pnas.0907658106. Epub 2009 Dec 10.
6
Parallel processing strategies of the primate visual system.灵长类视觉系统的并行处理策略。
Nat Rev Neurosci. 2009 May;10(5):360-72. doi: 10.1038/nrn2619. Epub 2009 Apr 8.
7
Two visual systems re-viewed.重新审视两种视觉系统。
Neuropsychologia. 2008 Feb 12;46(3):774-85. doi: 10.1016/j.neuropsychologia.2007.10.005. Epub 2007 Oct 18.
8
Feedforward, horizontal, and feedback processing in the visual cortex.视觉皮层中的前馈、水平和反馈处理
Curr Opin Neurobiol. 1998 Aug;8(4):529-35. doi: 10.1016/s0959-4388(98)80042-1.
9
Separate visual pathways for perception and action.用于感知和行动的独立视觉通路。
Trends Neurosci. 1992 Jan;15(1):20-5. doi: 10.1016/0166-2236(92)90344-8.