• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于声门激励线性预测的语音合成。

Speech synthesis by glottal excited linear prediction.

作者信息

Childers D G, Hu H T

机构信息

Department of Electrical Engineering, University of Florida, Gainesville 32611-2024.

出版信息

J Acoust Soc Am. 1994 Oct;96(4):2026-36. doi: 10.1121/1.411319.

DOI:10.1121/1.411319
PMID:7963019
Abstract

This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation. The coefficients of the polynomial model form a vector that represents the glottal excitation waveform for one pitch period. A glottal excitation code book with 32 entries for voiced excitation is designed and trained using two sentences spoken by different speakers. The purpose for using this approach is to demonstrate that quantization of the glottal excitation waveform does not significantly degrade the quality of speech synthesized with a glottal excitation linear predictive (GELP) synthesizer. This implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speed sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. It is conjectured that the glottal excitation codebook approach could provide a mechanism for quantitatively comparing the differences in glottal excitation codebooks for male and female speakers and for speakers with vocal disorders and for speakers with different voice types such as breathy and vocal fry voices. Conceivably, one could also convert the voice of a speaker with one voice type, e.g., breathy, to the voice of a speaker with another voice type, e.g., vocal fry, by synthesizing speech using the vocal tract LP parameters for the speaker with the breathy voice excited by the glottal excitation codebook trained for vocal fry.

摘要

本文描述了一种线性预测(LP)语音合成程序,该程序使用六阶多项式波形对声门激励进行建模,从而重新合成语音。多项式模型的系数形成一个向量,该向量表示一个基音周期的声门激励波形。设计并训练了一个具有32个浊音激励条目的声门激励码本,使用了不同说话者说出的两个句子。使用这种方法的目的是证明声门激励波形的量化不会显著降低使用声门激励线性预测(GELP)合成器合成的语音质量。LP合成器的这种实现方式是模仿基音激励LP语音合成器和码激励线性预测(CELP)语音编码器设计的。除了声门激励码本外,我们还使用了一个具有256个条目的随机码本用于清音噪声激励。文中描述了构建这两个码本的分析技术。高质量重新合成语音的GELP合成器为语音科学家提供了一种简单的语音合成程序,该程序使用既定的分析技术,能够再现所有语音声音,并且其激励模型波形与声门气流的导数和余量的积分相关。据推测,声门激励码本方法可以提供一种机制,用于定量比较男性和女性说话者、患有嗓音障碍的说话者以及具有不同嗓音类型(如呼吸声和喉塞音)的说话者的声门激励码本之间的差异。可以想象,通过使用为喉塞音训练的声门激励码本激励具有呼吸声的说话者的声道LP参数来合成语音,还可以将具有一种嗓音类型(如呼吸声)的说话者的声音转换为具有另一种嗓音类型(如喉塞音)的说话者的声音。

相似文献

1
Speech synthesis by glottal excited linear prediction.基于声门激励线性预测的语音合成。
J Acoust Soc Am. 1994 Oct;96(4):2026-36. doi: 10.1121/1.411319.
2
Vocal quality factors: analysis, synthesis, and perception.嗓音质量因素:分析、合成与感知。
J Acoust Soc Am. 1991 Nov;90(5):2394-410. doi: 10.1121/1.402044.
3
Glottal characteristics of female speakers: acoustic correlates.女性说话者的声门特征:声学关联
J Acoust Soc Am. 1997 Jan;101(1):466-81. doi: 10.1121/1.417991.
4
Modeling the glottal volume-velocity waveform for three voice types.对三种嗓音类型的声门容积速度波形进行建模。
J Acoust Soc Am. 1995 Jan;97(1):505-19. doi: 10.1121/1.412276.
5
Exploring the anatomical encoding of voice with a mathematical model of the vocal system.用语音系统的数学模型探索语音的解剖学编码。
Neuroimage. 2016 Nov 1;141:31-39. doi: 10.1016/j.neuroimage.2016.07.033. Epub 2016 Jul 17.
6
Perception of synthesized voice quality in connected speech by Cantonese speakers.粤语使用者对连贯语音中合成语音质量的感知。
J Acoust Soc Am. 2002 Sep;112(3 Pt 1):1091-101. doi: 10.1121/1.1500753.
7
Perceived Desirability of Vocal Fry Among Female Speech Communication Disorders Graduate Students.女性言语障碍研究生对声音嘶哑的感知吸引力。
J Voice. 2019 Sep;33(5):805.e21-805.e35. doi: 10.1016/j.jvoice.2018.03.010. Epub 2018 May 24.
8
Voice source model for continuous control of pitch period.
J Acoust Soc Am. 1993 Feb;93(2):1087-96. doi: 10.1121/1.405557.
9
Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer.使用基于生理学的发音语音合成器评估声门逆滤波算法
IEEE/ACM Trans Audio Speech Lang Process. 2017 Aug;25(8):1718-1730. doi: 10.1109/taslp.2017.2714839. Epub 2017 Jun 12.
10
Measuring and modeling vocal source-tract interaction.测量与建模声源-声道相互作用。
IEEE Trans Biomed Eng. 1994 Jul;41(7):663-71. doi: 10.1109/10.301733.

引用本文的文献

1
Subglottal Impedance-Based Inverse Filtering of Voiced Sounds Using Neck Surface Acceleration.基于声门下阻抗的颈部表面加速度对浊音进行逆滤波
IEEE Trans Audio Speech Lang Process. 2013 Sep;21(9):1929-1939. doi: 10.1109/TASL.2013.2263138.