• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于全参考和无参考视听质量评估的注意力引导神经网络。

Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment.

作者信息

Cao Yuqin, Min Xiongkuo, Sun Wei, Zhai Guangtao

出版信息

IEEE Trans Image Process. 2023;32:1882-1896. doi: 10.1109/TIP.2023.3251695. Epub 2023 Mar 21.

DOI:10.1109/TIP.2023.3251695
PMID:37030730
Abstract

With the popularity of mobile Internet, audio and video (A/V) have become the main way for people to entertain and socialize daily. However, in order to reduce the cost of media storage and transmission, A/V signals will be compressed by service providers before they are transmitted to end-users, which inevitably causes distortions in the A/V signals and degrades the end-user's Quality of Experience (QoE). This motivates us to research the objective audio-visual quality assessment (AVQA). In the field of AVQA, most previous works only focus on single-mode audio or visual signals, which ignores that the perceptual quality of users depends on both audio and video signals. Therefore, we propose an objective AVQA architecture for multi-mode signals based on attentional neural networks. Specifically, we first utilize an attention prediction model to extract the salient regions of video frames. Then, a pre-trained convolutional neural network is used to extract short-time features of the salient regions and the corresponding audio signals. Next, the short-time features are fed into Gated Recurrent Unit (GRU) networks to model the temporal relationship between adjacent frames. Finally, the fully connected layers are utilized to fuse the temporal related features of A/V signals modeled by the GRU network into the final quality score. The proposed architecture is flexible and can be applied to both full-reference and no-reference AVQA. Experimental results on the LIVE-SJTU Database and UnB-AVC Database demonstrate that our model outperforms the state-of-the-art AVQA methods. The code of the proposed method will be publicly available to promote the development of the field of AVQA.

摘要

随着移动互联网的普及,音频和视频(A/V)已成为人们日常娱乐和社交的主要方式。然而,为了降低媒体存储和传输成本,服务提供商在将A/V信号传输给终端用户之前会对其进行压缩,这不可避免地会导致A/V信号失真,并降低终端用户的体验质量(QoE)。这促使我们研究客观的视听质量评估(AVQA)。在AVQA领域,以往的大多数工作只关注单模式音频或视觉信号,而忽略了用户的感知质量取决于音频和视频信号两者。因此,我们提出了一种基于注意力神经网络的多模式信号客观AVQA架构。具体来说,我们首先利用注意力预测模型提取视频帧的显著区域。然后,使用预训练的卷积神经网络提取显著区域和相应音频信号的短时特征。接下来,将短时特征输入门控循环单元(GRU)网络,以对相邻帧之间的时间关系进行建模。最后,利用全连接层将GRU网络建模的A/V信号的时间相关特征融合到最终质量得分中。所提出的架构具有灵活性,可应用于全参考和无参考AVQA。在LIVE-SJTU数据库和UnB-AVC数据库上的实验结果表明,我们的模型优于当前最先进的AVQA方法。所提方法的代码将公开提供,以促进AVQA领域的发展。

相似文献

1
Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment.用于全参考和无参考视听质量评估的注意力引导神经网络。
IEEE Trans Image Process. 2023;32:1882-1896. doi: 10.1109/TIP.2023.3251695. Epub 2023 Mar 21.
2
Subjective and Objective Audio-Visual Quality Assessment for User Generated Content.用户生成内容的主观和客观视听质量评估。
IEEE Trans Image Process. 2023;32:3847-3861. doi: 10.1109/TIP.2023.3290528. Epub 2023 Jul 14.
3
Study of Subjective and Objective Quality Assessment of Audio-Visual Signals.视听信号的主观和客观质量评估研究
IEEE Trans Image Process. 2020 Apr 21. doi: 10.1109/TIP.2020.2988148.
4
Temporal Reasoning Guided QoE Evaluation for Mobile Live Video Broadcasting.基于时态推理的移动直播视频服务质量评估
IEEE Trans Image Process. 2021;30:3279-3292. doi: 10.1109/TIP.2021.3060255. Epub 2021 Mar 2.
5
Multi-Dimensional Feature Fusion Network for No-Reference Quality Assessment of In-the-Wild Videos.多维度特征融合网络的野外视频无参考质量评估。
Sensors (Basel). 2021 Aug 6;21(16):5322. doi: 10.3390/s21165322.
6
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework.基于级联双注意力卷积神经网络和双向门控循环单元框架的人类活动识别
J Imaging. 2023 Jun 26;9(7):130. doi: 10.3390/jimaging9070130.
7
No-Reference Video Quality Assessment Using Multi-Pooled, Saliency Weighted Deep Features and Decision Fusion.基于多池化、显著加权深度特征和决策融合的无参考视频质量评估。
Sensors (Basel). 2022 Mar 12;22(6):2209. doi: 10.3390/s22062209.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
A hybrid TCN-GRU model for classifying human activities using smartphone inertial signals.一种使用智能手机惯性信号对人类活动进行分类的混合 TCN-GRU 模型。
PLoS One. 2024 Aug 13;19(8):e0304655. doi: 10.1371/journal.pone.0304655. eCollection 2024.
10
PMF-CNN: parallel multi-band fusion convolutional neural network for SSVEP-EEG decoding.PMF-CNN:用于稳态视觉诱发电位脑电图解码的并行多波段融合卷积神经网络
Biomed Phys Eng Express. 2024 Mar 8;10(3). doi: 10.1088/2057-1976/ad2e36.

引用本文的文献

1
Character generation and visual quality enhancement in animated films using deep learning.利用深度学习进行动画电影中的角色生成与视觉质量提升
Sci Rep. 2025 Jul 2;15(1):23409. doi: 10.1038/s41598-025-07442-3.
2
Assessment of information quality and reliability on ankle sprains in short videos from Douyin and Bilibili.对抖音和哔哩哔哩短视频中踝关节扭伤信息质量和可靠性的评估。
Sci Rep. 2025 Jul 2;15(1):22654. doi: 10.1038/s41598-025-07656-5.
3
Enhancing left ventricular segmentation in echocardiography with a modified mixed attention mechanism in SegFormer architecture.
在SegFormer架构中采用改进的混合注意力机制增强超声心动图中的左心室分割。
Heliyon. 2024 Jul 17;10(15):e34845. doi: 10.1016/j.heliyon.2024.e34845. eCollection 2024 Aug 15.
4
YDD-SLAM: Indoor Dynamic Visual SLAM Fusing YOLOv5 with Depth Information.YDD-SLAM:融合YOLOv5与深度信息的室内动态视觉同步定位与地图构建
Sensors (Basel). 2023 Dec 3;23(23):9592. doi: 10.3390/s23239592.
5
Unsupervised blind image quality assessment via joint spatial and transform features.基于空间和变换特征的无监督盲图像质量评估。
Sci Rep. 2023 Jul 5;13(1):10865. doi: 10.1038/s41598-023-38099-5.