Suppr超能文献

语音性别识别。第二部分:精细分析。

Gender recognition from speech. Part II: Fine analysis.

作者信息

Childers D G, Wu K

机构信息

Department of Electrical Engineering, University of Florida, Gainesville 32611-2024.

出版信息

J Acoust Soc Am. 1991 Oct;90(4 Pt 1):1841-56. doi: 10.1121/1.401664.

Abstract

The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speaker's gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speaker's gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFF. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.

摘要

本研究的目的是调查数字语音处理和模式识别技术在从语音中自动识别性别的潜在有效性。在第一部分“粗略分析”[K. 吴和D. G. 奇尔德斯,《美国声学学会杂志》90, 1828 - 1840 (1991)]中,研究了各种特征向量和距离度量,以确定它们从元音、清擦音和浊擦音中识别说话者性别的适用性。一种基于从元音中提取的特征向量的识别方案,使用一个包含52名说话者(27名男性和25名女性)的数据库,实现了对说话者性别的100%正确识别。在本文中,对元音的特征进行了详细的精细分析,包括共振峰频率、带宽和幅度,以及说话者的基频。精细分析使用了一种基音同步闭相分析技术。详细的共振峰特征,包括频率、带宽和幅度,通过一种采用可变遗忘因子的闭相加权递归最小二乘法提取,即WRLS - VFF。使用电声门图信号来定位语音信号的闭相部分。进行了双向方差分析(ANOVA)以测试性别特征之间的差异。通过模式识别方法评估了分组元音特征的相对重要性。获得了许多有趣的结果,包括第二共振峰频率在识别性别方面比基频略好,正确识别率分别为98.1%和96.2%。统计测试表明,女性说话者的频谱斜率(或倾斜度)比男性的更陡。结果表明,冗余的性别信息嵌入在基频和声道共振特征中。观察到女性声音的特征向量组内变化比男性声音的更高。本研究中的数据还用于重复彼得森和巴尼[《美国声学学会杂志》24, 175 - 184 (1952)]对男性和女性说话者元音的部分研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验