Chong Eunji, Clark-Whitney Elysha, Southerland Audrey, Stubbs Elizabeth, Miller Chanel, Ajodan Eliana L, Silverman Melanie R, Lord Catherine, Rozga Agata, Jones Rebecca M, Rehg James M
School of Interactive Computing, Georgia Institute of Technology, Atlanta, USA.
Center for Autism and the Developing Brain, Weill Cornell Medicine, New York, USA.
Nat Commun. 2020 Dec 14;11(1):6386. doi: 10.1038/s41467-020-19712-x.
Eye contact is among the most primary means of social communication used by humans. Quantification of eye contact is valuable as a part of the analysis of social roles and communication skills, and for clinical screening. Estimating a subject's looking direction is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint. While moments of eye contact from this viewpoint can be hand-coded, such a process tends to be laborious and subjective. In this work, we develop a deep neural network model to automatically detect eye contact in egocentric video. It is the first to achieve accuracy equivalent to that of human experts. We train a deep convolutional network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 subjects have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision of 0.936 and recall of 0.943 on 18 validation subjects, and its performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. Our method will be instrumental in gaze behavior analysis by serving as a scalable, objective, and accessible tool for clinicians and researchers.
眼神交流是人类使用的最主要的社交沟通方式之一。眼神交流的量化作为社会角色和沟通技巧分析的一部分以及临床筛查手段,具有重要价值。估计受试者的注视方向是一项具有挑战性的任务,但可穿戴式视角摄像头能够有效捕捉眼神交流,该摄像头提供了独特的视角。虽然从这个视角的眼神交流瞬间可以手动编码,但这样的过程往往既费力又主观。在这项工作中,我们开发了一种深度神经网络模型,用于自动检测以自我为中心的视频中的眼神交流。这是首个达到与人类专家相当准确率的模型。我们使用一个包含4339879张标注图像的数据集训练了一个深度卷积网络,该数据集由103名具有不同人口统计学背景的受试者组成。其中57名受试者被诊断患有自闭症谱系障碍。该网络在18名验证受试者上实现了0.936的总体精度和0.943的召回率,其性能与10名经过训练的人类编码员相当,平均精度为0.918,召回率为0.946。我们的方法将作为一种可扩展、客观且易于使用的工具,为临床医生和研究人员的注视行为分析提供帮助。