Suppr超能文献

德语儿童导向言语中的词分割线索:语料库分析。

Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis.

机构信息

Language Development Department, Max Planck Institute for Psycholinguistics, The Netherlands.

Research School of Psychology, The Australian National University, Australia.

出版信息

Lang Speech. 2022 Mar;65(1):3-27. doi: 10.1177/0023830920979016. Epub 2021 Jan 30.

Abstract

To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.

摘要

为了习得语言,婴儿必须学会从连续的话语中切分出单词。大量的实验研究表明,婴儿会使用多种线索来完成这一任务;然而,很少有研究全面地考察了这些线索在自然语言中的分布情况。我们使用儿童语言数据交换系统(CHILDES)数据库中的数据,对德语儿童指向言语(CDS)进行了全面的语料库分析,调查了词重音、过渡概率(TPs)以及词汇和亚词汇频率作为单词切分潜在线索的可用性。对 7 小时的数据(约 15000 个单词)进行了编码,这些数据代表了婴儿平均一天的言语量。分析结果表明,在 97%的单词中,重音位于首音节,这表明在德语 CDS 中,重音是单词起始的可靠线索。词的身份也由音节之间的 TPs 标记,这些 TPs 在词内高于词间,在向后转换中高于向前转换。词遵循一种类似 Zipf 的频率分布,超过三分之二的词(78%)是单音节词。在最常出现的 50 个词中,82%是功能词,它们在整个语料库中占词素的 47%。最后,15%的话语由单个单词组成。这些结果为德语 CDS 中切分线索的可用性提供了丰富的新见解,并支持婴儿利用多种趋同线索来切分输入的可能性。我们向研究界公开提供这些数据,将有助于指导未来关于这一主题的实验研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/216900d02241/10.1177_0023830920979016-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验