Fu Yongjian, Wu Xintian, Li Xi, Pan Zhijie, Luo Daxin
IEEE Trans Image Process. 2020 May 6. doi: 10.1109/TIP.2020.2991510.
Different from many other attributes, facial expression can change in a continuous way, and therefore, a slight semantic change of input should also lead to the output fluctuation limited in a small scale. This consistency is important. However, current Facial Expression Recognition (FER) datasets may have the extreme imbalance problem, as well as the lack of data and the excessive amounts of noise, hindering this consistency and leading to a performance decreasing when testing. In this paper, we not only consider the prediction accuracy on sample points, but also take the neighborhood smoothness of them into consideration, focusing on the stability of the output with respect to slight semantic perturbations of the input. A novel method is proposed to formulate semantic perturbation and select unreliable samples during training, reducing the bad effect of them. Experiments show the effectiveness of the proposed method and state-of-the-art results are reported, getting closer to an upper limit than the state-of-the-art methods by a factor of 30% in AffectNet, the largest in-the-wild FER database by now.
与许多其他属性不同,面部表情可以连续变化,因此,输入的轻微语义变化也应导致输出波动限制在小范围内。这种一致性很重要。然而,当前的面部表情识别(FER)数据集可能存在极端不平衡问题,以及数据缺乏和噪声过多的问题,这阻碍了这种一致性,并导致测试时性能下降。在本文中,我们不仅考虑样本点上的预测准确性,还考虑它们的邻域平滑性,关注输出相对于输入的轻微语义扰动的稳定性。提出了一种新颖的方法来在训练期间制定语义扰动并选择不可靠样本,减少它们的不良影响。实验表明了所提出方法的有效性,并报告了达到当前最优的结果,在目前最大的野生FER数据库AffectNet中,比当前最优方法更接近上限30%。