Gilkerson Jill, Zhang Yiwen, Xu Dongxin, Richards Jeffrey A, Xu Xiaojuan, Jiang Fan, Harnsberger James, Topping Keith
J Speech Lang Hear Res. 2015 Apr;58(2):445-52. doi: 10.1044/2015_JSLHR-L-14-0014.
The purpose of this study was to evaluate performance of the Language Environment Analysis (LENA) automated language-analysis system for the Chinese Shanghai dialect and Mandarin (SDM) languages.
Volunteer parents of 22 children aged 3-23 months were recruited in Shanghai. Families provided daylong in-home audio recordings using LENA. A native speaker listened to 15 min of randomly selected audio samples per family to label speaker regions and provide Chinese character and SDM word counts for adult speakers. LENA segment labeling and counts were compared with rater-based values.
LENA demonstrated good sensitivity in identifying adult and child; this sensitivity was comparable to that of American English validation samples. Precision was strong for adults but less so for children. LENA adult word count correlated strongly with both Chinese characters and SDM word counts. LENA conversational turn counts correlated similarly with rater-based counts after the exclusion of three unusual samples. Performance related to some degree to child age.
LENA adult word count and conversational turn provided reasonably accurate estimates for SDM over the age range tested. Theoretical and practical considerations regarding LENA performance in non-English languages are discussed. Despite the pilot nature and other limitations of the study, results are promising for broader cross-linguistic applications.
本研究旨在评估语言环境分析(LENA)自动化语言分析系统对中国上海方言和普通话(SDM)的性能。
在上海招募了22名年龄在3至23个月之间儿童的志愿家长。家庭使用LENA提供全天的家庭音频记录。一名以该语言为母语的人听每个家庭随机选择的15分钟音频样本,以标记说话者区域,并提供成人说话者的汉字和SDM词汇计数。将LENA片段标记和计数与基于评分者的值进行比较。
LENA在识别成人和儿童方面表现出良好的敏感性;这种敏感性与美国英语验证样本的敏感性相当。成人的精确性较强,而儿童的精确性较弱。LENA成人词汇计数与汉字和SDM词汇计数都密切相关。在排除三个异常样本后,LENA对话轮次计数与基于评分者的计数也有类似的相关性。性能在一定程度上与儿童年龄有关。
在测试的年龄范围内,LENA成人词汇计数和对话轮次为SDM提供了合理准确的估计。讨论了关于LENA在非英语语言中性能的理论和实际考虑因素。尽管本研究具有试点性质和其他局限性,但结果对于更广泛的跨语言应用来说很有前景。