Erel Yotam, Shannon Katherine Adams, Chu Junyi, Scott Kim, Struhl Melissa Kline, Cao Peng, Tan Xincheng, Hart Peter, Raz Gal, Piccolo Sabrina, Mei Catherine, Potter Christine, Jaffe-Dax Sagi, Lew-Williams Casey, Tenenbaum Joshua, Fairchild Katherine, Bermano Amit, Liu Shari
The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv-Yafo, Israel.
Department of Psychology, Stanford University, Palo Alto, California.
Adv Methods Pract Psychol Sci. 2023 Apr-Jun;6(2). doi: 10.1177/25152459221147250. Epub 2023 Apr 18.
Technological advances in psychological research have enabled large-scale studies of human behavior and streamlined pipelines for automatic processing of data. However, studies of infants and children have not fully reaped these benefits because the behaviors of interest, such as gaze duration and direction, still have to be extracted from video through a laborious process of manual annotation, even when these data are collected online. Recent advances in computer vision raise the possibility of automated annotation of these video data. In this article, we built on a system for automatic gaze annotation in young children, iCatcher, by engineering improvements and then training and testing the system (referred to hereafter as iCatcher+) on three data sets with substantial video and participant variability (214 videos collected in U.S. lab and field sites, 143 videos collected in Senegal field sites, and 265 videos collected via webcams in homes; participant age range = 4 months-3.5 years). When trained on each of these data sets, iCatcher+ performed with near human-level accuracy on held-out videos on distinguishing "LEFT" versus "RIGHT" and "ON" versus "OFF" looking behavior across all data sets. This high performance was achieved at the level of individual frames, experimental trials, and study videos; held across participant demographics (e.g., age, race/ethnicity), participant behavior (e.g., movement, head position), and video characteristics (e.g., luminance); and generalized to a fourth, entirely held-out online data set. We close by discussing next steps required to fully automate the life cycle of online infant and child behavioral studies, representing a key step toward enabling robust and high-throughput developmental research.
心理学研究中的技术进步使得对人类行为的大规模研究成为可能,并简化了数据自动处理流程。然而,针对婴幼儿的研究尚未充分受益于这些成果,因为即便数据是在线收集的,诸如注视时长和方向等感兴趣的行为,仍需通过繁琐的人工注释过程从视频中提取。计算机视觉领域的最新进展为这些视频数据的自动注释带来了可能。在本文中,我们基于一个用于幼儿自动注视注释的系统iCatcher,通过技术改进构建了一个新系统,然后在三个具有大量视频和参与者差异的数据集上对该系统(以下简称iCatcher+)进行训练和测试(这三个数据集分别是:在美国实验室和实地收集的214个视频、在塞内加尔实地收集的143个视频以及通过家庭网络摄像头收集的265个视频;参与者年龄范围为4个月至3.5岁)。当在每个数据集上进行训练时,iCatcher+在区分所有数据集中“左”与“右”以及“注视”与“未注视”的注视行为的留一法视频测试中,表现出接近人类水平的准确率。这种高性能在单个帧、实验试验和研究视频层面均得以实现;在参与者人口统计学特征(如年龄、种族/民族)、参与者行为(如运动、头部位置)和视频特征(如亮度)方面均保持一致;并且能够推广到第四个完全独立的在线数据集。我们最后讨论了实现在线婴幼儿行为研究生命周期完全自动化所需的后续步骤,这是迈向开展稳健且高通量发展研究的关键一步。