Lee James Ren, Wang Linda, Wong Alexander
Vision and Image Processing Lab, Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada.
DarwinAI Corp., Waterloo, ON, Canada.
Front Artif Intell. 2021 Jan 13;3:609673. doi: 10.3389/frai.2020.609673. eCollection 2020.
While recent advances in deep learning have led to significant improvements in facial expression classification (FEC), a major challenge that remains a bottleneck for the widespread deployment of such systems is their high architectural and computational complexities. This is especially challenging given the operational requirements of various FEC applications, such as safety, marketing, learning, and assistive living, where real-time requirements on low-cost embedded devices is desired. Motivated by this need for a compact, low latency, yet accurate system capable of performing FEC in real-time on low-cost embedded devices, this study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy, where human experience is combined with machine meticulousness and speed in order to craft a deep neural network design catered toward real-time embedded usage. To the best of the author's knowledge, this is the very first deep neural network architecture for facial expression recognition leveraging machine-driven design exploration in its design process, and exhibits unique architectural characteristics such as high architectural heterogeneity and selective long-range connectivity not seen in previous FEC network architectures. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. Experimental results using the CK + facial expression benchmark dataset demonstrate that the proposed EmotionNet Nano networks achieved accuracy comparable to state-of-the-art FEC networks, while requiring significantly fewer parameters. Furthermore, we demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g., >25 FPS and >70 FPS at 15 and 30 W, respectively) and high energy efficiency (e.g., >1.7 images/sec/watt at 15 W) on an ARM embedded processor, thus further illustrating the efficacy of EmotionNet Nano for deployment on embedded devices.
虽然深度学习的最新进展已使面部表情分类(FEC)有了显著改进,但一个主要挑战仍然是此类系统广泛部署的瓶颈,即其高度的架构和计算复杂性。鉴于各种FEC应用的操作要求,如安全、营销、学习和辅助生活等领域对低成本嵌入式设备有实时要求,这一挑战尤其严峻。出于对一种紧凑、低延迟且准确的系统的需求,该系统能够在低成本嵌入式设备上实时执行FEC,本研究提出了EmotionNet Nano,这是一种通过人机协作设计策略创建的高效深度卷积神经网络,其中人类经验与机器的严谨性和速度相结合,以精心打造适合实时嵌入式使用的深度神经网络设计。据作者所知,这是首个在设计过程中利用机器驱动的设计探索的面部表情识别深度神经网络架构,并且展现出独特的架构特征,如高架构异构性和选择性长距离连接,这在以前的FEC网络架构中未见。本文展示了EmotionNet Nano的两种不同变体,每种变体在架构和计算复杂性与准确性之间有不同的权衡。使用CK +面部表情基准数据集的实验结果表明,所提出的EmotionNet Nano网络实现了与最先进的FEC网络相当的准确性,同时所需参数显著减少。此外,我们证明所提出的EmotionNet Nano网络在ARM嵌入式处理器上实现了实时推理速度(例如,在15瓦和30瓦时分别>25帧/秒和>70帧/秒)和高能效(例如,在15瓦时>1.7图像/秒/瓦),从而进一步说明了EmotionNet Nano在嵌入式设备上部署的有效性。