说话人内和说话人间的声学嗓音变化。

Acoustic voice variation within and between speakers.

机构信息

Department of Head and Neck Surgery, UCLA School of Medicine, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA.

Department of Linguistics, University of California, Los Angeles, 3125 Campbell Hall, Box 951543, Los Angeles, California 90095-1543, USA.

出版信息

J Acoust Soc Am. 2019 Sep;146(3):1568. doi: 10.1121/1.5125134.

DOI:10.1121/1.5125134

PMID:31590565

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6909978/

Abstract

Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.

摘要

关于日常语音质量的变化性质或程度，人们知之甚少。本文描述了一系列主成分分析，以探索说话者内和说话者间的声学变化，以及它们在多大程度上符合当前语音感知模型的预期。基于对人脸和说话者识别认知模型的研究，作者假设一些指标在说话者之间很重要，但大多数说话者内的变化是特质的。分析使用了来自 50 名女性和 50 名男性英语说话者的三天内多次句子产生的数据，对元音和近音进行了每 5 毫秒的 26 个声学变量的测量。在说话者之间，声音中较高谐波振幅与非谐波能量之间的平衡解释了最大的方差（女性= 20%，男性= 22%）。共振峰频率及其可变性占说话者间方差的 12%。其余的方差似乎主要是特质的，这表明不同的人具有不同的特定于说话者的声音空间。结果还表明，个体和说话者群体的声音空间具有非常相似的声学结构。讨论了对语音感知和识别原型模型的影响。

相似文献

Acoustic voice variation within and between speakers.说话人内和说话人间的声学嗓音变化。

J Acoust Soc Am. 2019 Sep;146(3):1568. doi: 10.1121/1.5125134.

Acoustic voice variation in spontaneous speech.自发言语中的语音变化。

J Acoust Soc Am. 2022 May;151(5):3462. doi: 10.1121/10.0011471.

Individual Talker and Token Covariation in the Production of Multiple Cues to Stop Voicing.在发出多个停止发声线索时个体说话者与标记的协变

Phonetica. 2018;75(1):1-23. doi: 10.1159/000448809. Epub 2017 Jun 9.

Multimodal standardization of voice among four multicultural populations formant structures.四个多文化群体语音共振峰结构的多模态标准化。

J Voice. 2001 Mar;15(1):61-77. doi: 10.1016/S0892-1997(01)00007-8.

Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data.基于 MRI 数据的两层多说话人建模方法：说话人发音可变性的特征描述。

J Acoust Soc Am. 2019 Apr;145(4):2149. doi: 10.1121/1.5096631.

Age, sex, and vowel dependencies of acoustic measures related to the voice source.与声源相关的声学测量的年龄、性别和元音依赖性。

J Acoust Soc Am. 2007 Apr;121(4):2283-95. doi: 10.1121/1.2697522.

A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation.一种基于共振峰-腔体归属概念的说话者间声学变异性模型。

J Acoust Soc Am. 2004 Jan;115(1):337-51. doi: 10.1121/1.1631946.

Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry.人类元音和类似元音的狒狒呼噜声的音高（F0）和共振峰轮廓：发声者体型和语音声学异速生长的作用。

J Acoust Soc Am. 2005 Feb;117(2):944-55. doi: 10.1121/1.1848011.

Comparison of acoustic and perceptual measures of voice in male-to-female transsexuals perceived as female versus those perceived as male.被视为女性的男变女跨性别者与被视为男性的男变女跨性别者在嗓音声学和感知测量方面的比较。

J Voice. 2000 Mar;14(1):22-33. doi: 10.1016/s0892-1997(00)80092-2.

Structured speaker variability in Japanese stops: Relationships within versus across cues to stop voicing.日语塞音中结构化的说话者变异性：塞音浊音线索内部与跨线索之间的关系。

J Acoust Soc Am. 2020 Aug;148(2):793. doi: 10.1121/10.0001734.

引用本文的文献

CoVox: A dataset of contrasting vocalizations.CoVox：一个包含对比发声的数据集。

Behav Res Methods. 2025 Apr 11;57(5):142. doi: 10.3758/s13428-025-02664-9.

Effects of laryngeal manipulations on voice gender perception.喉部操作对嗓音性别感知的影响。

Interspeech. 2022 Sep;2022:1856-1860. doi: 10.21437/interspeech.2022-10815.

Face and voice identity matching accuracy is not improved by multimodal identity information.多模态身份信息并不能提高面部和语音身份匹配的准确性。

Br J Psychol. 2025 May;116(2):367-385. doi: 10.1111/bjop.12757. Epub 2024 Dec 17.

Foreign language talker identification does not generalize to new talkers.外语说话者识别不能推广到新的说话者。

Psychon Bull Rev. 2025 Apr;32(2):941-950. doi: 10.3758/s13423-024-02598-x. Epub 2024 Oct 23.

Talker change detection by listeners varying in age and hearing loss.听众的年龄和听力损失变化对说话人变化的检测。

J Acoust Soc Am. 2024 Apr 1;155(4):2482-2491. doi: 10.1121/10.0025539.

Acoustic correlates of perceived personality from Korean utterances in a formal communicative setting.在正式交际场合中，从韩语话语中感知到的人格的声学相关性。

PLoS One. 2023 Oct 31;18(10):e0293222. doi: 10.1371/journal.pone.0293222. eCollection 2023.

Vocal Fold Vertical Thickness in Human Voice Production and Control: A Review.人类发声与控制中的声带垂直厚度：综述

J Voice. 2023 Mar 22. doi: 10.1016/j.jvoice.2023.02.021.

Multiple sources of acoustic variation affect speech processing efficiency.多种声学变异源影响言语处理效率。

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

Identifying unfamiliar voices: Examining the system variables of sample duration and parade size.识别不熟悉的声音：考察样本时长和列队大小的系统变量。

Q J Exp Psychol (Hove). 2023 Dec;76(12):2804-2822. doi: 10.1177/17470218231155738. Epub 2023 Mar 7.

The own-voice benefit for word recognition in early bilinguals.早期双语者中母语语音对单词识别的益处。

Front Psychol. 2022 Sep 2;13:901326. doi: 10.3389/fpsyg.2022.901326. eCollection 2022.

本文引用的文献

Breaking voice identity perception: Expressive voices are more confusable for listeners.打破语音身份认知：富有表现力的声音对听众来说更易混淆。

Q J Exp Psychol (Hove). 2019 Sep;72(9):2240-2248. doi: 10.1177/1747021819836890. Epub 2019 Mar 21.

A sound effect: Exploration of the distinctiveness advantage in voice recognition.一种音效：语音识别中独特性优势的探索。

Appl Cogn Psychol. 2018 Sep-Oct;32(5):526-536. doi: 10.1002/acp.3424. Epub 2018 Jul 4.

How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices.你听到了多少种声音？自然变异性会干扰对陌生声音的身份感知。

Br J Psychol. 2019 Aug;110(3):576-593. doi: 10.1111/bjop.12348. Epub 2018 Sep 16.

Understanding the mechanisms of familiar voice-identity recognition in the human brain.理解人类大脑中熟悉声音识别的机制。

Neuropsychologia. 2018 Jul 31;116(Pt B):179-193. doi: 10.1016/j.neuropsychologia.2018.03.039. Epub 2018 Mar 31.

Toward a unified theory of voice production and perception.迈向语音产生与感知的统一理论。

Loquens. 2014 Jan;1(1). doi: 10.3989/loquens.2014.009.

Learning faces from variability.从变异性中学习面部特征。

Q J Exp Psychol (Hove). 2017 May;70(5):897-905. doi: 10.1080/17470218.2015.1136656. Epub 2016 Mar 7.

The Scree Test For The Number Of Factors.因子数量的碎石检验

Multivariate Behav Res. 1966 Apr 1;1(2):245-76. doi: 10.1207/s15327906mbr0102_10.

Recognizing and identifying people: A neuropsychological review.识人和认人：一项神经心理学综述。

Cortex. 2016 Feb;75:132-150. doi: 10.1016/j.cortex.2015.11.023. Epub 2015 Dec 25.

Exemplar variance supports robust learning of facial identity.示例方差支持对面部身份的稳健学习。

J Exp Psychol Hum Percept Perform. 2015 Jun;41(3):577-81. doi: 10.1037/xhp0000049. Epub 2015 Apr 13.

Identity From Variation: Representations of Faces Derived From Multiple Instances.从变化中识别身份：源自多个实例的面部表征

Cogn Sci. 2016 Jan;40(1):202-23. doi: 10.1111/cogs.12231. Epub 2015 Mar 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验