基于神经网络工具的人类婴儿声音数据驱动自动化声学分析。

Data-driven automated acoustic analysis of human infant vocalizations using neural network tools.

机构信息

School of Audiology and Speech-Language Pathology, The University of Memphis, 807 Jefferson Avenue, Memphis, Tennessee 38105, USA.

出版信息

J Acoust Soc Am. 2010 Apr;127(4):2563-77. doi: 10.1121/1.3327460.

DOI:10.1121/1.3327460

PMID:20370038

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2865706/

Abstract

Acoustic analysis of infant vocalizations has typically employed traditional acoustic measures drawn from adult speech acoustics, such as f(0), duration, formant frequencies, amplitude, and pitch perturbation. Here an alternative and complementary method is proposed in which data-derived spectrographic features are central. 1-s-long spectrograms of vocalizations produced by six infants recorded longitudinally between ages 3 and 11 months are analyzed using a neural network consisting of a self-organizing map and a single-layer perceptron. The self-organizing map acquires a set of holistic, data-derived spectrographic receptive fields. The single-layer perceptron receives self-organizing map activations as input and is trained to classify utterances into prelinguistic phonatory categories (squeal, vocant, or growl), identify the ages at which they were produced, and identify the individuals who produced them. Classification performance was significantly better than chance for all three classification tasks. Performance is compared to another popular architecture, the fully supervised multilayer perceptron. In addition, the network's weights and patterns of activation are explored from several angles, for example, through traditional acoustic measurements of the network's receptive fields. Results support the use of this and related tools for deriving holistic acoustic features directly from infant vocalization data and for the automatic classification of infant vocalizations.

摘要

婴儿发声的声学分析通常采用源自成人语音声学的传统声学测量方法，例如 f(0)、时长、共振峰频率、幅度和音高扰动力。在此，提出了一种替代方法和补充方法，其中数据衍生的频谱特征是核心。对 6 名婴儿在 3 至 11 个月之间纵向记录的 1 秒长发声进行分析，使用由自组织映射和单层感知器组成的神经网络。自组织映射获取一组整体的、数据衍生的频谱感受野。单层感知器接收自组织映射的激活作为输入，并经过训练将话语分类为前语言发音类别（尖叫、发音或咆哮），识别它们产生的年龄，并识别产生它们的个体。所有三种分类任务的分类性能均明显优于随机性能。将性能与另一种流行的架构，即完全监督的多层感知器进行比较。此外，从多个角度探索了网络的权重和激活模式，例如，通过网络感受野的传统声学测量。结果支持使用这些和相关工具直接从婴儿发声数据中提取整体声学特征，并对婴儿发声进行自动分类。

相似文献

Data-driven automated acoustic analysis of human infant vocalizations using neural network tools.

J Acoust Soc Am. 2010 Apr;127(4):2563-77. doi: 10.1121/1.3327460.

Acoustic observations in young children's non-cry vocalizations.

J Acoust Soc Am. 1988 May;83(5):1876-82. doi: 10.1121/1.396523.

Flow Glottogram Characteristics and Perceived Degree of Phonatory Pressedness.

J Voice. 2016 May;30(3):287-92. doi: 10.1016/j.jvoice.2015.03.014. Epub 2015 May 20.

[The acoustic characteristics of the vocalizations of hypoacusic children].

Vestn Otorinolaringol. 1998(4):62-4.

[Use of self-organizing neural networks (Kohonen maps) for classification of voice acoustic signals exemplified by the infant voice with and without time-delayed auditory feedback].

HNO. 1996 Apr;44(4):201-6.

Acoustic features of infant vocalic utterances at 3, 6, and 9 months.

J Acoust Soc Am. 1982 Aug;72(2):353-65. doi: 10.1121/1.388089.

Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels.

J Voice. 2013 Mar;27(2):201-9. doi: 10.1016/j.jvoice.2012.12.006.

Finding good acoustic features for parrot vocalizations: the feature generation approach.

J Acoust Soc Am. 2011 Feb;129(2):1089-99. doi: 10.1121/1.3531953.

Formant characteristics of human laughter.

J Voice. 2011 Jan;25(1):32-7. doi: 10.1016/j.jvoice.2009.06.010. Epub 2010 Apr 8.

Performance of Acoustic Measures for the Discrimination Among Healthy, Rough, Breathy, and Strained Voices Using the Feedforward Neural Network.

J Voice. 2025 Jan;39(1):1-9. doi: 10.1016/j.jvoice.2022.07.002. Epub 2022 Aug 23.

引用本文的文献

From data to discovery: Technology propels speech-language research and theory-building in developmental science.

Neurosci Biobehav Rev. 2025 Jul;174:106199. doi: 10.1016/j.neubiorev.2025.106199. Epub 2025 May 5.

Emerging Verbal Functions in Early Infancy: Lessons from Observational and Computational Approaches on Typical Development and Neurodevelopmental Disorders.

Adv Neurodev Disord. 2022 Dec;6(4):369-388. doi: 10.1007/s41252-022-00300-7. Epub 2022 Oct 25.

Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations.

Speech Commun. 2021 Oct;133:41-61. doi: 10.1016/j.specom.2021.07.010. Epub 2021 Aug 18.

Reliability of Listener Judgments of Infant Vocal Imitation.

Front Psychol. 2019 Jun 11;10:1340. doi: 10.3389/fpsyg.2019.01340. eCollection 2019.

Language Origins Viewed in Spontaneous and Interactive Vocal Rates of Human and Bonobo Infants.

Front Psychol. 2019 Apr 2;10:729. doi: 10.3389/fpsyg.2019.00729. eCollection 2019.

Methods for eliciting, annotating, and analyzing databases for child speech development.

Comput Speech Lang. 2017 Sep;45:278-299. doi: 10.1016/j.csl.2017.02.010.

Functional flexibility of infant vocalization and the emergence of language.

Proc Natl Acad Sci U S A. 2013 Apr 16;110(16):6318-23. doi: 10.1073/pnas.1300337110. Epub 2013 Apr 2.

本文引用的文献

Teaching by listening: the importance of adult-child conversations to language development.

Pediatrics. 2009 Jul;124(1):342-9. doi: 10.1542/peds.2008-2267.

Fundamental frequency development in typically developing infants and infants with severe-to-profound hearing loss.

Clin Linguist Phon. 2008 Dec;22(12):917-36. doi: 10.1080/02699200802316776.

Social feedback to infants' babbling facilitates rapid phonological learning.

Psychol Sci. 2008 May;19(5):515-23. doi: 10.1111/j.1467-9280.2008.02117.x.

The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data.

IEEE Trans Neural Netw. 2002;13(6):1331-41. doi: 10.1109/TNN.2002.804221.

Feature weighting in "chick-a-dee" call notes of Poecile atricapillus.

J Acoust Soc Am. 2007 Oct;122(4):2451-8. doi: 10.1121/1.2770540.

The Autism Observation Scale for Infants: scale development and reliability data.

J Autism Dev Disord. 2008 Apr;38(4):731-8. doi: 10.1007/s10803-007-0440-y. Epub 2007 Sep 14.

Vibratory regime classification of infant phonation.

J Voice. 2008 Sep;22(5):553-64. doi: 10.1016/j.jvoice.2006.12.009. Epub 2007 May 23.

Validity of the MacArthur-Bates Communicative Development Inventories for measuring language abilities in children with cochlear implants.

Am J Speech Lang Pathol. 2007 Feb;16(1):54-64. doi: 10.1044/1058-0360(2007/007).

Variation in vocal-motor development in infant siblings of children with autism.

J Autism Dev Disord. 2007 Jan;37(1):158-70. doi: 10.1007/s10803-006-0339-z. Epub 2006 Dec 27.

Assessing vocal development in infants and toddlers.

Clin Linguist Phon. 2006 Jul;20(5):351-69. doi: 10.1080/02699200500211451.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

基于神经网络工具的人类婴儿声音数据驱动自动化声学分析。

Data-driven automated acoustic analysis of human infant vocalizations using neural network tools.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

基于神经网络工具的人类婴儿声音数据驱动自动化声学分析。

Data-driven automated acoustic analysis of human infant vocalizations using neural network tools.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献