Department of Psychology, The University of Texas at Austin, 108 E Dean Keeton St, Austin, TX, 78712, USA.
Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USA.
Behav Res Methods. 2023 Sep;55(6):3187-3197. doi: 10.3758/s13428-022-01961-x. Epub 2022 Sep 9.
Human infant crying evolved as a signal to elicit parental care and actively influences caregiving behaviors as well as infant-caregiver interactions. Automated cry detection algorithms have become more popular in recent decades, and while some models exist, they have not been evaluated thoroughly on daylong naturalistic audio recordings. Here, we validate a novel deep learning cry detection model by testing it in assessment scenarios important to developmental researchers. We also evaluate the deep learning model's performance relative to LENA's cry classifier, one of the most commonly used commercial software systems for quantifying child crying. Broadly, we found that both deep learning and LENA model outputs showed convergent validity with human annotations of infant crying. However, the deep learning model had substantially higher accuracy metrics (recall, F1, kappa) and stronger correlations with human annotations at all timescales tested (24 h, 1 h, and 5 min) relative to LENA. On average, LENA underestimated infant crying by 50 min every 24 h relative to human annotations and the deep learning model. Additionally, daily infant crying times detected by both automated models were lower than parent-report estimates in the literature. We provide recommendations and solutions for leveraging automated algorithms to detect infant crying in the home and make our training data and model code open source and publicly available.
人类婴儿的哭声是一种引发父母照顾的信号,它积极地影响着照顾行为以及婴儿与照顾者的互动。近年来,自动哭声检测算法变得越来越流行,虽然已经存在一些模型,但它们并没有在全天的自然录音上进行全面评估。在这里,我们通过在对发展研究人员重要的评估场景中测试一种新的深度学习哭声检测模型来验证其有效性。我们还评估了深度学习模型相对于 LENA 哭声分类器的性能,LENA 是用于量化儿童哭声的最常用商业软件系统之一。总的来说,我们发现深度学习和 LENA 模型的输出与人工注释的婴儿哭声具有收敛有效性。然而,与 LENA 相比,深度学习模型在所有测试的时间尺度(24 小时、1 小时和 5 分钟)上都具有更高的准确性指标(召回率、F1 值、kappa 值)和与人工注释更强的相关性。平均而言,与人工注释和深度学习模型相比,LENA 每 24 小时会低估婴儿哭泣 50 分钟。此外,两种自动模型检测到的每日婴儿哭泣时间都低于文献中的父母报告估计值。我们提供了一些建议和解决方案,以利用自动化算法在家中检测婴儿的哭声,并开源和公开我们的训练数据和模型代码。