• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

德语儿童导向言语中的词分割线索:语料库分析。

Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis.

机构信息

Language Development Department, Max Planck Institute for Psycholinguistics, The Netherlands.

Research School of Psychology, The Australian National University, Australia.

出版信息

Lang Speech. 2022 Mar;65(1):3-27. doi: 10.1177/0023830920979016. Epub 2021 Jan 30.

DOI:10.1177/0023830920979016
PMID:33517856
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8886305/
Abstract

To acquire language, infants must learn to segment words from running speech. A significant body of experimental research shows that infants use multiple cues to do so; however, little research has comprehensively examined the distribution of such cues in naturalistic speech. We conducted a comprehensive corpus analysis of German child-directed speech (CDS) using data from the Child Language Data Exchange System (CHILDES) database, investigating the availability of word stress, transitional probabilities (TPs), and lexical and sublexical frequencies as potential cues for word segmentation. Seven hours of data (~15,000 words) were coded, representing around an average day of speech to infants. The analysis revealed that for 97% of words, primary stress was carried by the initial syllable, implicating stress as a reliable cue to word onset in German CDS. Word identity was also marked by TPs between syllables, which were higher within than between words, and higher for backwards than forwards transitions. Words followed a Zipfian-like frequency distribution, and over two-thirds of words (78%) were monosyllabic. Of the 50 most frequent words, 82% were function words, which accounted for 47% of word tokens in the entire corpus. Finally, 15% of all utterances comprised single words. These results give rich novel insights into the availability of segmentation cues in German CDS, and support the possibility that infants draw on multiple converging cues to segment their input. The data, which we make openly available to the research community, will help guide future experimental investigations on this topic.

摘要

为了习得语言,婴儿必须学会从连续的话语中切分出单词。大量的实验研究表明,婴儿会使用多种线索来完成这一任务;然而,很少有研究全面地考察了这些线索在自然语言中的分布情况。我们使用儿童语言数据交换系统(CHILDES)数据库中的数据,对德语儿童指向言语(CDS)进行了全面的语料库分析,调查了词重音、过渡概率(TPs)以及词汇和亚词汇频率作为单词切分潜在线索的可用性。对 7 小时的数据(约 15000 个单词)进行了编码,这些数据代表了婴儿平均一天的言语量。分析结果表明,在 97%的单词中,重音位于首音节,这表明在德语 CDS 中,重音是单词起始的可靠线索。词的身份也由音节之间的 TPs 标记,这些 TPs 在词内高于词间,在向后转换中高于向前转换。词遵循一种类似 Zipf 的频率分布,超过三分之二的词(78%)是单音节词。在最常出现的 50 个词中,82%是功能词,它们在整个语料库中占词素的 47%。最后,15%的话语由单个单词组成。这些结果为德语 CDS 中切分线索的可用性提供了丰富的新见解,并支持婴儿利用多种趋同线索来切分输入的可能性。我们向研究界公开提供这些数据,将有助于指导未来关于这一主题的实验研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/da69ac928133/10.1177_0023830920979016-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/216900d02241/10.1177_0023830920979016-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/c9cfe1624cc5/10.1177_0023830920979016-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/f83940cddc2b/10.1177_0023830920979016-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/da69ac928133/10.1177_0023830920979016-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/216900d02241/10.1177_0023830920979016-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/c9cfe1624cc5/10.1177_0023830920979016-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/f83940cddc2b/10.1177_0023830920979016-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6286/8886305/da69ac928133/10.1177_0023830920979016-fig4.jpg

相似文献

1
Word Segmentation Cues in German Child-Directed Speech: A Corpus Analysis.德语儿童导向言语中的词分割线索:语料库分析。
Lang Speech. 2022 Mar;65(1):3-27. doi: 10.1177/0023830920979016. Epub 2021 Jan 30.
2
Prosody outweighs statistics in 6-month-old German-learning infants' speech segmentation.在 6 个月大的学习德语的婴儿的言语分割中,韵律胜过统计学。
Infancy. 2024 Sep-Oct;29(5):750-770. doi: 10.1111/infa.12593. Epub 2024 May 4.
3
Early Speech Segmentation in French-learning Infants: Monosyllabic Words versus Embedded Syllables.学习法语的婴儿的早期语音分割:单音节词与嵌入音节
Lang Speech. 2015 Sep;58(Pt 3):334-50. doi: 10.1177/0023830914551375.
4
Infants' statistical word segmentation in an artificial language is linked to both parental speech input and reported production abilities.婴儿在人工语言中的统计词分割与父母的言语输入和报告的产生能力有关。
Dev Sci. 2019 Jul;22(4):e12803. doi: 10.1111/desc.12803. Epub 2019 Feb 22.
5
Nine-month-olds use frequency of onset clusters to segment novel words.9个月大的婴儿利用起始音群的频率来分割新单词。
J Exp Child Psychol. 2016 Aug;148:131-41. doi: 10.1016/j.jecp.2016.04.004. Epub 2016 May 12.
6
Familiar units prevail over statistical cues in word segmentation.在分词过程中,熟悉的单位比统计线索更具优势。
Psychol Res. 2017 Sep;81(5):990-1003. doi: 10.1007/s00426-016-0793-y. Epub 2016 Aug 31.
7
Harmonic cues for speech segmentation: a cross-linguistic corpus study on child-directed speech.言语分段的和声线索:针对儿童言语的跨语言语料库研究。
J Child Lang. 2014 Mar;41(2):439-61. doi: 10.1017/S0305000912000724. Epub 2013 Feb 21.
8
Modeling the contribution of phonotactic cues to the problem of word segmentation.建立模型以探究音位规则线索对分词问题的贡献。
J Child Lang. 2010 Jun;37(3):487-511. doi: 10.1017/S030500090999050X. Epub 2010 Mar 22.
9
Early syllabic segmentation of fluent speech by infants acquiring French.学习法语的婴儿对流畅言语的早期音节分割
PLoS One. 2013 Nov 7;8(11):e79646. doi: 10.1371/journal.pone.0079646. eCollection 2013.
10
Words and syllables in fluent speech segmentation by French-learning infants: an ERP study.法语学习婴儿流畅言语分段中的词和音节:一项 ERP 研究。
Brain Res. 2010 May 21;1332:75-89. doi: 10.1016/j.brainres.2010.03.047. Epub 2010 Mar 21.

引用本文的文献

1
How Do Enriched Speech Acoustics Support Language Acquisition in Children With Hearing Loss? A Narrative Review.丰富的语音声学如何支持听力损失儿童的语言习得?一项叙述性综述。
Ear Hear. 2025;46(3):551-562. doi: 10.1097/AUD.0000000000001606. Epub 2024 Dec 10.
2
Infants show systematic rhythmic motor responses while listening to rhythmic speech.婴儿在听有节奏的言语时会表现出系统性的节奏性运动反应。
Front Psychol. 2024 Jun 17;15:1370007. doi: 10.3389/fpsyg.2024.1370007. eCollection 2024.
3
Infants' sensitivity to phonotactic regularities related to perceptually low-salient fricatives: a cross-linguistic study.

本文引用的文献

1
The Longitudinal Relationship Between Conversational Turn-Taking and Vocabulary Growth in Early Language Development.会话中轮流对话与早期语言发展中词汇增长的纵向关系。
Child Dev. 2021 Mar;92(2):609-625. doi: 10.1111/cdev.13511. Epub 2021 Feb 6.
2
Exploring the "anchor word" effect in infants: Segmentation and categorisation of speech with and without high frequency words.探索婴儿的“锚定词”效应:有无高频词的言语分割和分类。
PLoS One. 2020 Dec 17;15(12):e0243436. doi: 10.1371/journal.pone.0243436. eCollection 2020.
3
Change in maternal speech rate to preverbal infants over the first two years of life.
婴儿对与感知上不突出的擦音相关的音位规则的敏感性:一项跨语言研究。
Front Psychol. 2024 Mar 6;15:1367240. doi: 10.3389/fpsyg.2024.1367240. eCollection 2024.
4
Statistical learning at a virtual cocktail party.在虚拟鸡尾酒会上进行统计学习。
Psychon Bull Rev. 2024 Apr;31(2):849-861. doi: 10.3758/s13423-023-02384-1. Epub 2023 Oct 2.
5
Neural Tracking in Infancy Predicts Language Development in Children With and Without Family History of Autism.婴儿期的神经追踪可预测有无自闭症家族史儿童的语言发展。
Neurobiol Lang (Camb). 2022 Aug 17;3(3):495-514. doi: 10.1162/nol_a_00074. eCollection 2022.
6
Real-world statistics at two timescales and a mechanism for infant learning of object names.实时统计数据的两个时间尺度和婴儿学习物体名称的机制。
Proc Natl Acad Sci U S A. 2022 May 3;119(18):e2123239119. doi: 10.1073/pnas.2123239119. Epub 2022 Apr 28.
7
Can Menzerath's law be a criterion of complexity in communication?门泽尔定律能否成为通信复杂性的一个标准?
PLoS One. 2021 Aug 20;16(8):e0256133. doi: 10.1371/journal.pone.0256133. eCollection 2021.
母亲言语率在婴儿生命头两年向言语前的变化。
J Child Lang. 2020 Nov;47(6):1263-1275. doi: 10.1017/S030500091900093X. Epub 2020 Mar 11.
4
Early Language Experience in a Tseltal Mayan Village.台勒尔语泰雅村的早期语言经验
Child Dev. 2020 Sep;91(5):1819-1835. doi: 10.1111/cdev.13349. Epub 2019 Dec 31.
5
Cross-situational learning in a Zipfian environment.在 Zipf 环境下的跨情境学习。
Cognition. 2019 Aug;189:11-22. doi: 10.1016/j.cognition.2019.03.005. Epub 2019 Mar 20.
6
Linguistic entrenchment: Prior knowledge impacts statistical learning performance.语言僵化:先验知识影响统计学习表现。
Cognition. 2018 Aug;177:198-213. doi: 10.1016/j.cognition.2018.04.011. Epub 2018 Apr 26.
7
The coefficient of determination and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.重访和扩展广义线性混合效应模型的决定系数和组内相关系数。
J R Soc Interface. 2017 Sep;14(134). doi: 10.1098/rsif.2017.0213. Epub 2017 Sep 13.
8
Canalization of Language Structure From Environmental Constraints: A Computational Model of Word Learning From Multiple Cues.基于环境限制的语言结构渠化:一种从多个线索进行词汇学习的计算模型
Top Cogn Sci. 2017 Jan;9(1):21-34. doi: 10.1111/tops.12239. Epub 2016 Dec 18.
9
Co-occurrence statistics as a language-dependent cue for speech segmentation.作为语音分割的语言相关线索的共现统计
Dev Sci. 2017 May;20(3). doi: 10.1111/desc.12390. Epub 2016 May 4.
10
Headstart for speech segmentation: a neural signature for the anchor word effect.语音分割的先机:锚定词效应的神经特征
Neuropsychologia. 2016 Feb;82:189-199. doi: 10.1016/j.neuropsychologia.2016.01.011. Epub 2016 Jan 11.