• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多头注意力、融合与交互的注视估计网络

Gaze Estimation Network Based on Multi-Head Attention, Fusion, and Interaction.

作者信息

Li Changli, Li Fangfang, Zhang Kao, Chen Nenglun, Pan Zhigeng

机构信息

School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China.

出版信息

Sensors (Basel). 2025 Mar 18;25(6):1893. doi: 10.3390/s25061893.

DOI:10.3390/s25061893
PMID:40293029
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11945386/
Abstract

Gaze is an externally observable indicator of human visual attention, and thus, recording the gaze position can help to solve many problems. Existing gaze estimation models typically utilize separate neural network branches to process data streams from both eyes and the face, failing to fully exploit their feature correlations. This study presents a gaze estimation network that integrates multi-head attention mechanisms, fusion, and interaction strategies to fuse facial features with eye features, as well as features from both eyes, separately. Specifically, multi-head attention and channel attention are used to fuse features from both eyes, and a face and eye interaction module is designed to highlight the most important facial features guided by the eye features; in addition, the channel attention in the Convolutional Block Attention Module (CBAM) is replaced with minimum pooling instead of maximum pooling, and a shortcut connection is added to enhance the network's attention to eye region details. Comparative experiments on three public datasets-Gaze360, MPIIFaceGaze, and EYEDIAP-validate the superiority of the proposed method.

摘要

注视是人类视觉注意力的一种外部可观察指标,因此,记录注视位置有助于解决许多问题。现有的注视估计模型通常利用单独的神经网络分支来处理来自双眼和面部的数据流,未能充分利用它们的特征相关性。本研究提出了一种注视估计网络,该网络集成了多头注意力机制、融合和交互策略,以分别将面部特征与眼睛特征以及双眼特征进行融合。具体而言,使用多头注意力和通道注意力来融合双眼特征,并设计了一个面部与眼睛交互模块,以在眼睛特征的引导下突出最重要的面部特征;此外,将卷积块注意力模块(CBAM)中的通道注意力替换为最小池化而非最大池化,并添加了一条捷径连接以增强网络对眼睛区域细节的关注。在三个公共数据集Gaze360、MPIIFaceGaze和EYEDIAP上进行的对比实验验证了所提方法的优越性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/7f9dd7a55c68/sensors-25-01893-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/c110bc69e01a/sensors-25-01893-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/77d00e9a5717/sensors-25-01893-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/a0f4813de4d5/sensors-25-01893-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/577e194123cb/sensors-25-01893-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/8ef509513027/sensors-25-01893-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/8c4abe17718b/sensors-25-01893-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/ff8878b766ae/sensors-25-01893-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/6e9f6db2f5fc/sensors-25-01893-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/2dbbfdb0fc6b/sensors-25-01893-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/6a46195c0ad2/sensors-25-01893-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/c8185f287e27/sensors-25-01893-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/7f9dd7a55c68/sensors-25-01893-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/c110bc69e01a/sensors-25-01893-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/77d00e9a5717/sensors-25-01893-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/a0f4813de4d5/sensors-25-01893-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/577e194123cb/sensors-25-01893-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/8ef509513027/sensors-25-01893-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/8c4abe17718b/sensors-25-01893-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/ff8878b766ae/sensors-25-01893-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/6e9f6db2f5fc/sensors-25-01893-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/2dbbfdb0fc6b/sensors-25-01893-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/6a46195c0ad2/sensors-25-01893-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/c8185f287e27/sensors-25-01893-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba97/11945386/7f9dd7a55c68/sensors-25-01893-g012.jpg

相似文献

1
Gaze Estimation Network Based on Multi-Head Attention, Fusion, and Interaction.基于多头注意力、融合与交互的注视估计网络
Sensors (Basel). 2025 Mar 18;25(6):1893. doi: 10.3390/s25061893.
2
FreeGaze: A Framework for 3D Gaze Estimation Using Appearance Cues from a Facial Video.FreeGaze:一种基于面部视频外观线索的 3D 注视估计框架。
Sensors (Basel). 2023 Dec 4;23(23):9604. doi: 10.3390/s23239604.
3
Complementary effects of gaze direction and early saliency in guiding fixations during free viewing.自由观看过程中注视方向和早期显著性在引导注视方面的互补作用。
J Vis. 2014 Nov 4;14(13):3. doi: 10.1167/14.13.3.
4
Multiview Multitask Gaze Estimation With Deep Convolutional Neural Networks.基于深度卷积神经网络的多视图多任务注视估计。
IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):3010-3023. doi: 10.1109/TNNLS.2018.2865525. Epub 2018 Sep 3.
5
Gaze Estimation Approach Using Deep Differential Residual Network.基于深度差分残差网络的注视估计方法。
Sensors (Basel). 2022 Jul 21;22(14):5462. doi: 10.3390/s22145462.
6
Eyes always attract attention but gaze orienting is task-dependent: evidence from eye movement monitoring.眼睛总是吸引注意力,但注视定向取决于任务:来自眼动监测的证据。
Neuropsychologia. 2007 Mar 14;45(5):1019-28. doi: 10.1016/j.neuropsychologia.2006.09.004. Epub 2006 Oct 24.
7
Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities.凝视自然:用于研究日常活动中眼睛和头部协调的数据集。
Sci Rep. 2020 Feb 13;10(1):2539. doi: 10.1038/s41598-020-59251-5.
8
A Gaze Estimation Method Based on Spatial and Channel Reconstructed ResNet Combined with Multi-Clue Fusion.一种基于空间和通道重构残差网络并结合多线索融合的注视估计方法。
J Imaging. 2025 Mar 27;11(4):99. doi: 10.3390/jimaging11040099.
9
An integrated neural network model for eye-tracking during human-computer interaction.一种用于人机交互过程中眼动追踪的集成神经网络模型。
Math Biosci Eng. 2023 Jun 21;20(8):13974-13988. doi: 10.3934/mbe.2023622.
10
Head-eye interactions during vertical gaze shifts made by rhesus monkeys.恒河猴垂直眼跳过程中的头眼相互作用。
Exp Brain Res. 2005 Dec;167(4):557-70. doi: 10.1007/s00221-005-0051-9. Epub 2005 Aug 13.

引用本文的文献

1
Dual Focus-3D: A Hybrid Deep Learning Approach for Robust 3D Gaze Estimation.双焦点3D:一种用于稳健3D注视估计的混合深度学习方法。
Sensors (Basel). 2025 Jun 30;25(13):4086. doi: 10.3390/s25134086.

本文引用的文献

1
Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation.用于个性化视频注视估计的时空注意力与高斯过程
Conf Comput Vis Pattern Recognit Workshops. 2024 Jun;2024:604-614. doi: 10.1109/cvprw63382.2024.00065. Epub 2024 Sep 27.
2
Appearance-Based Gaze Estimation With Deep Learning: A Review and Benchmark.基于外观的深度学习注视估计:综述与基准测试
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):7509-7528. doi: 10.1109/TPAMI.2024.3393571. Epub 2024 Nov 6.
3
Automatic Gaze Analysis: A Survey of Deep Learning Based Approaches.
自动眼动分析:基于深度学习方法的调查。
IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):61-84. doi: 10.1109/TPAMI.2023.3321337. Epub 2023 Dec 5.
4
Appearance-Based Gaze Estimation for ASD Diagnosis.基于外观的孤独症谱系障碍诊断中的注视估计。
IEEE Trans Cybern. 2022 Jul;52(7):6504-6517. doi: 10.1109/TCYB.2022.3165063. Epub 2022 Jul 4.
5
Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation.面向基于外观的注视估计中的高性能低复杂度校准
IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):1174-1188. doi: 10.1109/TPAMI.2022.3148386. Epub 2022 Dec 5.
6
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation.马克斯·普朗克智能系统研究所注视数据集:真实世界数据集与基于深度外观的注视估计
IEEE Trans Pattern Anal Mach Intell. 2019 Jan;41(1):162-175. doi: 10.1109/TPAMI.2017.2778103. Epub 2017 Nov 28.