语音变异对算法分词的影响。

Consequences of phonological variation for algorithmic word segmentation.

机构信息

Department of Psychology, University of Pennsylvania, 425 S University Ave, Philadelphia, PA 19104, USA.

出版信息

Cognition. 2023 Jun;235:105401. doi: 10.1016/j.cognition.2023.105401. Epub 2023 Feb 12.

DOI:10.1016/j.cognition.2023.105401

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10085835/

Abstract

Over the first year, infants begin to learn the words of their language. Previous work suggests that certain statistical regularities in speech could help infants segment the speech stream into words, thereby forming a proto-lexicon that could support learning of the eventual vocabulary. However, computational models of word segmentation have typically been tested using language input that is much less variable than actual speech is. We show that using actual, transcribed pronunciations rather than dictionary pronunciations of the same speech leads to worse segmentation performance across models. We also find that phonologically variable input poses serious problems for lexicon building, because even correctly segmented word forms exhibit a complex, many-to-many relationship with speakers' intended words. Many phonologically distinct word forms were actually the same intended word, and many identical transcriptions came from different intended words. The fact that previous models appear to have substantially overestimated the utility of simple statistical heuristics suggests a need to consider the formation of the lexicon in infancy differently.

摘要

在第一年中，婴儿开始学习他们的语言的单词。先前的工作表明，语音中的某些统计规律可以帮助婴儿将语音流分割成单词，从而形成一个可能支持最终词汇学习的原始词汇。然而，单词分割的计算模型通常使用比实际语音变化少得多的语言输入进行测试。我们表明，使用实际的转录发音而不是同一语音的字典发音会导致所有模型的分割性能都变差。我们还发现，语音可变输入给词汇构建带来了严重的问题，因为即使是正确分割的单词形式也与说话者的预期单词之间存在复杂的多对一关系。许多语音上不同的单词形式实际上是同一个预期单词，而许多相同的转录来自不同的预期单词。先前的模型似乎大大高估了简单统计启发式的效用，这表明需要以不同的方式考虑婴儿期词汇的形成。

相似文献

1

Consequences of phonological variation for algorithmic word segmentation.

Cognition. 2023 Jun;235:105401. doi: 10.1016/j.cognition.2023.105401. Epub 2023 Feb 12.

2

One language or two? Navigating cross-language conflict in statistical word segmentation.

Dev Sci. 2020 Nov;23(6):e12960. doi: 10.1111/desc.12960. Epub 2020 May 20.

3

Word-form familiarity bootstraps infant speech segmentation.

Dev Sci. 2013 Nov;16(6):980-90. doi: 10.1111/desc.12071. Epub 2013 Jun 11.

4

Isolated words enhance statistical language learning in infancy.

Dev Sci. 2011 Nov;14(6):1323-9. doi: 10.1111/j.1467-7687.2011.01079.x. Epub 2011 Aug 2.

5

Can infants map meaning to newly segmented words? Statistical segmentation and word learning.

Psychol Sci. 2007 Mar;18(3):254-60. doi: 10.1111/j.1467-9280.2007.01885.x.

6

Early Speech Segmentation in French-learning Infants: Monosyllabic Words versus Embedded Syllables.

Lang Speech. 2015 Sep;58(Pt 3):334-50. doi: 10.1177/0023830914551375.

7

Listening through voices: Infant statistical word segmentation across multiple speakers.

Dev Psychol. 2015 Nov;51(11):1517-28. doi: 10.1037/a0039725. Epub 2015 Sep 21.

8

Infants' sensitivity to vowel harmony and its role in segmenting speech.

Cognition. 2018 Feb;171:95-107. doi: 10.1016/j.cognition.2017.10.020. Epub 2017 Nov 7.

9

Statistical word segmentation: Anchoring learning across contexts.

Infancy. 2023 Mar;28(2):257-276. doi: 10.1111/infa.12525. Epub 2022 Dec 19.

10

Effects of prior phonotactic knowledge on infant word segmentation: the case of nonadjacent dependencies.

J Speech Lang Hear Res. 2013 Jun;56(3):840-9. doi: 10.1044/1092-4388(2012/12-0138). Epub 2012 Dec 28.

本文引用的文献

1

Relating referential clarity and phonetic clarity in infant-directed speech.

Dev Sci. 2024 Mar;27(2):e13442. doi: 10.1111/desc.13442. Epub 2023 Aug 23.

2

Does morphological complexity affect word segmentation? Evidence from computational modeling.

Cognition. 2022 Mar;220:104960. doi: 10.1016/j.cognition.2021.104960. Epub 2021 Dec 14.

3

How much does prosody help word segmentation? A simulation study on infant-directed speech.

Cognition. 2022 Feb;219:104961. doi: 10.1016/j.cognition.2021.104961. Epub 2021 Nov 29.

4

Acoustic-Lexical Characteristics of Child-Directed Speech Between 7 and 24 Months and Their Impact on Toddlers' Phonological Processing.

Front Psychol. 2021 Sep 24;12:712647. doi: 10.3389/fpsyg.2021.712647. eCollection 2021.

5

Boosting the input: 9-month-olds' sensitivity to low-frequency phonotactic patterns in novel wordforms.

Infancy. 2021 Sep;26(5):745-755. doi: 10.1111/infa.12423. Epub 2021 Jul 23.

6

Does Infant-Directed Speech Help Phonetic Learning? A Machine Learning Investigation.

Cogn Sci. 2021 May;45(5):e12946. doi: 10.1111/cogs.12946.

7

Naming guides how 12-month-old infants encode and remember objects.

Proc Natl Acad Sci U S A. 2020 Sep 1;117(35):21230-21234. doi: 10.1073/pnas.2006608117. Epub 2020 Aug 17.

8

Segmentability Differences Between Child-Directed and Adult-Directed Speech: A Systematic Test With an Ecologically Valid Corpus.

Open Mind (Camb). 2019 Feb 1;3:13-22. doi: 10.1162/opmi_a_00022.

9

WordSeg: Standardizing unsupervised word form segmentation from text.

Behav Res Methods. 2020 Feb;52(1):264-278. doi: 10.3758/s13428-019-01223-3.

10

Reduced speech: All is variability.

Wiley Interdiscip Rev Cogn Sci. 2019 Jul;10(4):e1496. doi: 10.1002/wcs.1496. Epub 2019 Feb 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。