Lavechin Marvin, de Seyssel Maureen, Titeux Hadrien, Wisniewski Guillaume, Bredin Hervé, Cristia Alejandrina, Dupoux Emmanuel
GIPSA-lab, Université Grenoble Alpes, Grenoble, France.
Laboratoire de Sciences Cognitives et de Psycholinguistique, Département d'Études Cognitives, ENS, EHESS, CNRS, PSL University, Paris, France.
Dev Sci. 2025 Mar;28(2):e13606. doi: 10.1111/desc.13606.
Before they even talk, infants become sensitive to the speech sounds of their native language and recognize the auditory form of an increasing number of words. Traditionally, these early perceptual changes are attributed to an emerging knowledge of linguistic categories such as phonemes or words. However, there is growing skepticism surrounding this interpretation due to limited evidence of category knowledge in infants. Previous modeling work has shown that a distributional learning algorithm could reproduce perceptual changes in infants' early phonetic learning without acquiring phonetic categories. Taking this inquiry further, we propose that linguistic categories may not be needed for early word learning. We introduce STELA, a predictive coding algorithm designed to extract statistical patterns from continuous raw speech data. Our findings demonstrate that STELA can reproduce some developmental patterns of phonetic and word form learning without relying on linguistic categories such as phonemes or words nor requiring explicit word segmentation. Through an analysis of the learned representations, we show evidence that linguistic categories may emerge as an end product of learning rather than being prerequisites during early language acquisition.
在开始说话之前,婴儿就已经对其母语的语音变得敏感,并识别出越来越多单词的听觉形式。传统上,这些早期的感知变化被归因于对诸如音素或单词等语言类别知识的逐渐掌握。然而,由于婴儿类别知识的证据有限,这种解释越来越受到质疑。之前的建模工作表明,一种分布学习算法可以在不获取语音类别的情况下,重现婴儿早期语音学习中的感知变化。进一步探究这个问题,我们提出早期单词学习可能不需要语言类别。我们引入了STELA,这是一种预测编码算法,旨在从连续的原始语音数据中提取统计模式。我们的研究结果表明,STELA可以重现语音和单词形式学习的一些发展模式,而无需依赖音素或单词等语言类别,也不需要明确的单词分割。通过对学习表征的分析,我们证明了语言类别可能是学习的最终产物,而不是早期语言习得过程中的先决条件。