Suppr超能文献

使用依赖于过渡的状态进行与说话者无关的音素对齐

Speaker-Independent Phoneme Alignment Using Transition-Dependent States.

作者信息

Hosom John-Paul

机构信息

Center for Spoken Language Understanding, School of Science & Engineering, Oregon Health & Science University, 20000 NW Walker Road, Beaverton, OR 97006 USA,

出版信息

Speech Commun. 2009 Apr;51(4):352-368. doi: 10.1016/j.specom.2008.11.003.

Abstract

Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20 msec on a variety of corpora, and 93.49% within 20 msec on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20 msec, and performance of the proposed system on this corpus is 93.36% within 20 msec. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.

摘要

确定音素的位置对于许多语音应用都很重要,包括自动语音识别系统的训练、文本转语音系统的构建以及人类语音处理的研究。在各种语料库上,人类对音素位置的平均一致率在20毫秒内为93.78%,在TIMIT语料库上在20毫秒内为93.49%。我们描述了一个基线强制对齐系统以及对该基线进行了若干修改的提议系统。修改包括在标准倒谱特征集中添加基于能量的特征、使用给定观测值时状态转移的概率以及计算独特语音特征的概率而非音素级概率。基线系统在TIMIT语料库测试分区上在20毫秒内的准确率为91.48%,提议系统在该语料库上在20毫秒内的准确率为93.36%。提议系统的结果与基线系统相比,错误率相对降低了22%,与非隐马尔可夫对齐系统的结果相比,错误率降低了14%。93.36%的一致率这一结果是TIMIT语料库上已知的最佳报告结果。

相似文献

4
Automatic recognition of pathological phoneme production.病理性音素产生的自动识别。
Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11.
6
Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.正则化说话人自适应 KL-HMM 在构音障碍语音识别中的应用。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验