School of Audiology and Speech-Language Pathology, The University of Memphis, 807 Jefferson Avenue, Memphis, Tennessee 38105, USA.
J Acoust Soc Am. 2010 Apr;127(4):2563-77. doi: 10.1121/1.3327460.
Acoustic analysis of infant vocalizations has typically employed traditional acoustic measures drawn from adult speech acoustics, such as f(0), duration, formant frequencies, amplitude, and pitch perturbation. Here an alternative and complementary method is proposed in which data-derived spectrographic features are central. 1-s-long spectrograms of vocalizations produced by six infants recorded longitudinally between ages 3 and 11 months are analyzed using a neural network consisting of a self-organizing map and a single-layer perceptron. The self-organizing map acquires a set of holistic, data-derived spectrographic receptive fields. The single-layer perceptron receives self-organizing map activations as input and is trained to classify utterances into prelinguistic phonatory categories (squeal, vocant, or growl), identify the ages at which they were produced, and identify the individuals who produced them. Classification performance was significantly better than chance for all three classification tasks. Performance is compared to another popular architecture, the fully supervised multilayer perceptron. In addition, the network's weights and patterns of activation are explored from several angles, for example, through traditional acoustic measurements of the network's receptive fields. Results support the use of this and related tools for deriving holistic acoustic features directly from infant vocalization data and for the automatic classification of infant vocalizations.
婴儿发声的声学分析通常采用源自成人语音声学的传统声学测量方法,例如 f(0)、时长、共振峰频率、幅度和音高扰动力。在此,提出了一种替代方法和补充方法,其中数据衍生的频谱特征是核心。对 6 名婴儿在 3 至 11 个月之间纵向记录的 1 秒长发声进行分析,使用由自组织映射和单层感知器组成的神经网络。自组织映射获取一组整体的、数据衍生的频谱感受野。单层感知器接收自组织映射的激活作为输入,并经过训练将话语分类为前语言发音类别(尖叫、发音或咆哮),识别它们产生的年龄,并识别产生它们的个体。所有三种分类任务的分类性能均明显优于随机性能。将性能与另一种流行的架构,即完全监督的多层感知器进行比较。此外,从多个角度探索了网络的权重和激活模式,例如,通过网络感受野的传统声学测量。结果支持使用这些和相关工具直接从婴儿发声数据中提取整体声学特征,并对婴儿发声进行自动分类。