• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视频语义分割中的静态-动态类级感知一致性

Static-dynamic class-level perception consistency in video semantic segmentation.

作者信息

Cen Zhigang, Guo Ningyan, Xu Wenjing, Feng Zhiyong, Huang Danlan

机构信息

Beijing University of Posts and Telecommunications, Beijing, 100876, China.

Beijing University of Posts and Telecommunications, Beijing, 100876, China.

出版信息

Neural Netw. 2025 Aug 7;192:107953. doi: 10.1016/j.neunet.2025.107953.

DOI:10.1016/j.neunet.2025.107953
PMID:40795503
Abstract

Video semantic segmentation (VSS) has been widely employed in lots of fields, such as simultaneous localization and mapping, autonomous driving and surveillance. Its core challenge is how to leverage temporal information to achieve better segmentation. Previous efforts have primarily focused on pixel-level static-dynamic contexts matching, utilizing techniques such as optical flow and attention mechanism. Instead, this paper rethinks static-dynamic contexts at the class level and proposes a novel static-dynamic class-level perceptual consistency (SD-CPC) framework. In this framework, we propose multivariate class prototype with contrastive learning and a static-dynamic semantic alignment module. The former provides class-level constraints for the model, obtaining personalized inter-class features and diversified intra-class features. The latter first establishes intra-frame spatial multi-scale and multi-level correlations to achieve static semantic alignment. Then, based on cross-frame static perceptual differences, it performs two-stage cross-frame selective aggregation to achieve dynamic semantic alignment. Meanwhile, we propose a novel window-based attention map calculation method that leverages the sparsity of cross-frame attention points and the Hadamard product, which reduces the computational cost of cross-frame attention aggregation. It is worth noting that the proposed method achieves a 51.1 mIoU on the VSPW dataset using MiT-B5, and 81.6 mIoU and 78.2 mIoU on the Cityscapes and CamVid datasets, respectively, using ResNet101. These results surpass those of other existing state-of-the-art methods. Our implementation will be open-sourced on GitHub.

摘要

视频语义分割(VSS)已在许多领域中广泛应用,如同时定位与地图构建、自动驾驶和监控。其核心挑战在于如何利用时间信息来实现更好的分割。以往的工作主要集中在像素级的静态-动态上下文匹配,采用诸如光流和注意力机制等技术。相反,本文在类别级别重新思考静态-动态上下文,并提出了一种新颖的静态-动态类别级感知一致性(SD-CPC)框架。在这个框架中,我们提出了具有对比学习的多元类别原型和一个静态-动态语义对齐模块。前者为模型提供类别级约束,获得个性化的类间特征和多样化的类内特征。后者首先建立帧内空间多尺度和多层次相关性以实现静态语义对齐。然后,基于跨帧静态感知差异,它执行两阶段跨帧选择性聚合以实现动态语义对齐。同时,我们提出了一种新颖的基于窗口的注意力图计算方法,该方法利用跨帧注意力点的稀疏性和哈达玛积,降低了跨帧注意力聚合的计算成本。值得注意的是,所提出的方法使用MiT-B5在VSPW数据集上实现了51.1的平均交并比(mIoU),使用ResNet101在Cityscapes和CamVid数据集上分别实现了81.6的mIoU和78.2的mIoU。这些结果超过了其他现有的先进方法。我们的实现将在GitHub上开源。

相似文献

1
Static-dynamic class-level perception consistency in video semantic segmentation.视频语义分割中的静态-动态类级感知一致性
Neural Netw. 2025 Aug 7;192:107953. doi: 10.1016/j.neunet.2025.107953.
2
Short-Term Memory Impairment短期记忆障碍
3
Multi-sequence brain tumor segmentation boosted by deep semantic features.基于深度语义特征增强的多序列脑肿瘤分割
Med Phys. 2025 Apr 28. doi: 10.1002/mp.17845.
4
Multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3D brain tumor segmentation.多级通道空间注意力与轻量级尺度融合网络(MCSLF-Net):用于3D脑肿瘤分割的多级通道空间注意力与轻量级尺度融合变换器
Quant Imaging Med Surg. 2025 Jul 1;15(7):6301-6325. doi: 10.21037/qims-2025-354. Epub 2025 Jun 30.
5
Semantic consistency-guided patch-wise relation graph reasoning scheme for lung cancer organoid segmentation in brightfield microscopy.用于明场显微镜下肺癌类器官分割的语义一致性引导的逐块关系图推理方案
Comput Methods Programs Biomed. 2025 Nov;271:108964. doi: 10.1016/j.cmpb.2025.108964. Epub 2025 Jul 23.
6
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
7
Diffusion semantic segmentation model: A generative model for medical image segmentation based on joint distribution.扩散语义分割模型:一种基于联合分布的医学图像分割生成模型。
Med Phys. 2025 Jul;52(7):e17928. doi: 10.1002/mp.17928. Epub 2025 Jun 8.
8
DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation.DGCFNet:用于遥感图像语义分割的双全局上下文融合网络
PeerJ Comput Sci. 2025 Mar 27;11:e2786. doi: 10.7717/peerj-cs.2786. eCollection 2025.
9
Cascaded Dynamic Memory Refinement and Semantic Alignment for Exo-to-Ego Cross-View Video Generation.
IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):7490-7505. doi: 10.1109/TPAMI.2025.3569195.
10
Semi-Supervised Echocardiography Video Segmentation via Adaptive Spatio-Temporal Tensor Semantic Awareness and Memory Flow.基于自适应时空张量语义感知与记忆流的半监督超声心动图视频分割
IEEE Trans Med Imaging. 2025 May;44(5):2182-2193. doi: 10.1109/TMI.2025.3526955. Epub 2025 May 2.