Laboratoire de Sciences Cognitives et de Psycholinguistique, Département d'Etudes cognitives, ENS, EHESS, CNRS, PSL University, Paris, France.
Interdisciplinary Centre for Science and Technology Studies (IZWT) Wuppertal, University of Wuppertal, Nordrhein-Westfalen, Germany.
Behav Res Methods. 2024 Dec;56(8):8588-8607. doi: 10.3758/s13428-024-02493-2. Epub 2024 Sep 20.
Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children's language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.
长时音频记录越来越多地被用于研究发展科学的理论和应用领域中的个体差异、群体差异以及许多其他主题,特别是用于描述儿童的语言输入(通常是成人的言语)和儿童的语言输出(从咿呀学语到句子)。专有 LENA 软件已经存在了十多年,用户已经开始依赖于衍生指标,如成人单词计数(AWC)和儿童发声计数(CVC),最近也使用开源替代方案 ACLEW 管道衍生了这些指标。然而,几乎没有关于长时指标在个体差异随时间的稳定性方面的可靠性的工作。为了填补这一空白,我们分析了八个口语数据集:四个来自北美英语学习婴儿,一个来自英国英语、法语、美国英语/西班牙语和盖丘亚语/西班牙语学习婴儿。使用两种类型的处理软件:LENA 和开源的 ACLEW 管道对音频数据进行了分析。当包含所有语料库时,我们发现可靠性相对较低到中等(对于大多数指标,归因于儿童身份的多个录音的组内相关系数 [Child ICC] 低于 50%)。这两个管道之间几乎没有差异。探索性分析表明,某些指标因儿童年龄和语料库而异。这些发现表明,虽然可靠性可能足以进行各种群体水平分析,但在使用 LENA 或 ACLEW 工具研究个体差异时需要谨慎。我们还鼓励改进现有的工具,特别是针对个体差异的准确测量。