School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China.
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
PLoS One. 2023 May 1;18(5):e0284814. doi: 10.1371/journal.pone.0284814. eCollection 2023.
Gaze estimation plays a critical role in human-centered vision applications such as human-computer interaction and virtual reality. Although significant progress has been made in automatic gaze estimation by deep convolutional neural networks, it is still difficult to directly deploy deep learning based gaze estimation models across different edge devices, due to the high computational cost and various resource constraints. This work proposes LiteGaze, a deep learning framework to learn architectures for efficient gaze estimation via neural architecture search (NAS). Inspired by the once-for-all model (Cai et al., 2020), this work decouples the model training and architecture search into two different stages. In particular, a supernet is trained to support diverse architectural settings. Then specialized sub-networks are selected from the obtained supernet, given different efficiency constraints. Extensive experiments are performed on two gaze estimation datasets and demonstrate the superiority of the proposed method over previous works, advancing the real-time gaze estimation on edge devices.
注视估计在以人为中心的视觉应用中起着至关重要的作用,例如人机交互和虚拟现实。尽管通过深度卷积神经网络在自动注视估计方面取得了重大进展,但由于计算成本高和各种资源限制,仍然难以直接在不同的边缘设备上部署基于深度学习的注视估计模型。这项工作提出了 LiteGaze,这是一个通过神经架构搜索(NAS)学习高效注视估计架构的深度学习框架。受一次完成模型(Cai 等人,2020)的启发,这项工作将模型训练和架构搜索解耦到两个不同的阶段。具体来说,训练一个超网以支持各种架构设置。然后,根据不同的效率约束,从获得的超网中选择专门的子网。在两个注视估计数据集上进行了广泛的实验,证明了所提出方法优于以前的工作,推进了边缘设备上的实时注视估计。