Suppr超能文献

形态句法的统计特征:对匈牙利语和意大利语婴儿导向语的研究。

The statistical signature of morphosyntax: a study of Hungarian and Italian infant-directed speech.

机构信息

CNRS, Paris, France.

出版信息

Cognition. 2012 Nov;125(2):263-87. doi: 10.1016/j.cognition.2012.06.010. Epub 2012 Aug 6.

Abstract

Does statistical learning (Saffran, Aslin, & Newport, 1996) offer a universal segmentation strategy for young language learners? Previous studies on large corpora of English and structurally similar languages have shown that statistical segmentation can be an effective strategy. However, many of the world's languages have richer morphological systems, with sometimes several affixes attached to a stem (e.g. Hungarian: iskoláinkban: iskolá-i-nk-ban school.pl.poss1pl.inessive 'in our schools'). In these languages, word boundaries and morpheme boundaries do not coincide. Does the internal structure of words affect segmentation? What word forms does segmentation yield in morphologically rich languages: complex word forms or separate stems and affixes? The present paper answers these questions by exploring different segmentation algorithms in infant-directed speech corpora from two typologically and structurally different languages, Hungarian and Italian. The results suggest that the morphological and syntactic type of a language has an impact on statistical segmentation, with different strategies working best in different languages. Specifically, the direction of segmentation seems to be sensitive to the affixation order of a language. Thus, backward probabilities are more effective in Hungarian, a heavily suffixing language, whereas forward probabilities are more informative in Italian, which has fewer suffixes and a large number of phrase-initial function words. The consequences of these findings for potential segmentation and word learning strategies are discussed.

摘要

统计学习(Saffran、Aslin 和 Newport,1996)是否为年轻的语言学习者提供了一种通用的分割策略?之前对大量英语和结构相似的语言的研究表明,统计分割可以是一种有效的策略。然而,世界上许多语言的形态系统更加丰富,有时一个词干上会有几个词缀(例如,匈牙利语:iskoláinkban:iskolá-i-nk-ban,意为“在我们的学校里”)。在这些语言中,词界和语素界并不重合。词的内部结构是否会影响分割?在形态丰富的语言中,分割会产生什么样的词形:复杂的词形还是独立的词干和词缀?本文通过探索来自两种类型学和结构不同的语言(匈牙利语和意大利语)的婴儿导向语音语料库中的不同分割算法,回答了这些问题。结果表明,语言的形态和句法类型对统计分割有影响,不同的策略在不同的语言中效果最佳。具体来说,分割的方向似乎对语言的词缀顺序敏感。因此,在后缀丰富的匈牙利语中,后向概率更有效,而在后缀较少且有大量短语起始功能词的意大利语中,前向概率更具信息量。这些发现对潜在的分割和单词学习策略的影响将在讨论中进行探讨。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验