Crane Center for Early Childhood Research and Policy, The Ohio State University, Columbus, OH, United States of America.
Educational Psychology Program, Department of Educational Studies, The Ohio State University, Columbus, Ohio, United States of America.
PLoS One. 2020 Nov 25;15(11):e0242511. doi: 10.1371/journal.pone.0242511. eCollection 2020.
The present study explored whether a tool for automatic detection and recognition of interactions and child-directed speech (CDS) in preschool classrooms could be developed, validated, and applied to non-coded video recordings representing children's classroom experiences. Using first-person video recordings collected by 13 preschool children during a morning in their classrooms, we extracted high-level audiovisual features from recordings using automatic speech recognition and computer vision services from a cloud computing provider. Using manual coding for interactions and transcriptions of CDS as reference, we trained and tested supervised classifiers and linear mappings to measure five variables of interest. We show that the supervised classifiers trained with speech activity, proximity, and high-level facial features achieve adequate accuracy in detecting interactions. Furthermore, in combination with an automatic speech recognition service, the supervised classifier achieved error rates for CDS measures that are in line with other open-source automatic decoding tools in early childhood settings. Finally, we demonstrate our tool's applicability by using it to automatically code and transcribe children's interactions and CDS exposure vertically within a classroom day (morning to afternoon) and horizontally over time (fall to winter). Developing and scaling tools for automatized capture of children's interactions with others in the preschool classroom, as well as exposure to CDS, may revolutionize scientific efforts to identify precise mechanisms that foster young children's language development.
本研究探讨了是否可以开发、验证和应用一种用于自动检测和识别互动和儿童指向性言语(CDS)的工具,以应用于代表儿童课堂体验的非编码视频记录。使用 13 名学龄前儿童在课堂上的上午收集的第一人称视频记录,我们使用来自云计算提供商的自动语音识别和计算机视觉服务从记录中提取高级视听特征。使用互动和 CDS 转录的手动编码作为参考,我们训练和测试了监督分类器和线性映射,以测量五个感兴趣的变量。我们表明,使用语音活动、接近度和高级面部特征训练的监督分类器在检测互动方面具有足够的准确性。此外,与自动语音识别服务结合使用,监督分类器实现的 CDS 测量误差率与早期儿童环境中的其他开源自动解码工具相当。最后,我们通过使用它自动编码和转录儿童在课堂日(上午到下午)内和随时间(秋季到冬季)的互动和 CDS 暴露情况,展示了我们工具的适用性。开发和扩展用于自动捕捉学龄前儿童与他人互动以及接触 CDS 的工具,可能会彻底改变识别促进幼儿语言发展的精确机制的科学努力。