• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用依赖于过渡的状态进行与说话者无关的音素对齐

Speaker-Independent Phoneme Alignment Using Transition-Dependent States.

作者信息

Hosom John-Paul

机构信息

Center for Spoken Language Understanding, School of Science & Engineering, Oregon Health & Science University, 20000 NW Walker Road, Beaverton, OR 97006 USA,

出版信息

Speech Commun. 2009 Apr;51(4):352-368. doi: 10.1016/j.specom.2008.11.003.

DOI:10.1016/j.specom.2008.11.003
PMID:20161342
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2682710/
Abstract

Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20 msec on a variety of corpora, and 93.49% within 20 msec on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20 msec, and performance of the proposed system on this corpus is 93.36% within 20 msec. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.

摘要

确定音素的位置对于许多语音应用都很重要,包括自动语音识别系统的训练、文本转语音系统的构建以及人类语音处理的研究。在各种语料库上,人类对音素位置的平均一致率在20毫秒内为93.78%,在TIMIT语料库上在20毫秒内为93.49%。我们描述了一个基线强制对齐系统以及对该基线进行了若干修改的提议系统。修改包括在标准倒谱特征集中添加基于能量的特征、使用给定观测值时状态转移的概率以及计算独特语音特征的概率而非音素级概率。基线系统在TIMIT语料库测试分区上在20毫秒内的准确率为91.48%,提议系统在该语料库上在20毫秒内的准确率为93.36%。提议系统的结果与基线系统相比,错误率相对降低了22%,与非隐马尔可夫对齐系统的结果相比,错误率降低了14%。93.36%的一致率这一结果是TIMIT语料库上已知的最佳报告结果。

相似文献

1
Speaker-Independent Phoneme Alignment Using Transition-Dependent States.使用依赖于过渡的状态进行与说话者无关的音素对齐
Speech Commun. 2009 Apr;51(4):352-368. doi: 10.1016/j.specom.2008.11.003.
2
A Tunable Forced Alignment System Based on Deep Learning: Applications to Child Speech.一种基于深度学习的可调谐强制对齐系统:在儿童语音中的应用。
J Speech Lang Hear Res. 2025 Jul 29;68(7S):3583-3601. doi: 10.1044/2024_JSLHR-24-00347. Epub 2025 Mar 31.
3
Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.基于深度神经网络的语音识别的多分辨率语音分析:在 TIMIT 上的实验。
PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.
4
Automatic recognition of pathological phoneme production.病理性音素产生的自动识别。
Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11.
5
Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription.提高文本无关强制对齐以支持言语语言病理学家进行音标转写。
Sensors (Basel). 2023 Dec 6;23(24):9650. doi: 10.3390/s23249650.
6
Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.正则化说话人自适应 KL-HMM 在构音障碍语音识别中的应用。
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.
7
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
8
Spectro-temporal modulation energy based mask for robust speaker identification.基于谱时调制能量的掩蔽稳健说话人识别。
J Acoust Soc Am. 2012 May;131(5):EL368-74. doi: 10.1121/1.3697534.
9
Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。
Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.
10
Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition.基于时频掩蔽的语音倒谱表示:在语音识别中的应用。
J Acoust Soc Am. 1996 Jul;100(1):603-14. doi: 10.1121/1.415961.

引用本文的文献

1
Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production.口误的自动分析:对言语产生认知结构的洞察
Cognition. 2016 Apr;149:31-9. doi: 10.1016/j.cognition.2016.01.002. Epub 2016 Jan 9.
2
Determining the relevance of different aspects of formant contours to intelligibility.确定共振峰轮廓的不同方面与可懂度的相关性。
Speech Commun. 2014 Apr 1;59:1-9. doi: 10.1016/j.specom.2013.12.001.
3
Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment.使用自动对齐分析濒危语言数据:测试未训练对齐的可行性。
J Acoust Soc Am. 2013 Sep;134(3):2235-46. doi: 10.1121/1.4816491.
4
Spoken Language Derived Measures for Detecting Mild Cognitive Impairment.用于检测轻度认知障碍的口语衍生测量方法。
IEEE Trans Audio Speech Lang Process. 2011 Sep 1;19(7):2081-2090. doi: 10.1109/TASL.2011.2112351.

本文引用的文献

1
Neural Network Classifiers Estimate Bayesian Probabilities.神经网络分类器估计贝叶斯概率。
Neural Comput. 1991 Winter;3(4):461-483. doi: 10.1162/neco.1991.3.4.461.
2
Age-related differences in identification and discrimination of temporal cues in speech segments.语音片段中时间线索识别与辨别方面的年龄相关差异。
J Acoust Soc Am. 2006 Apr;119(4):2455-66. doi: 10.1121/1.2171527.
3
A diagnostic marker for childhood apraxia of speech: the coefficient of variation ratio.儿童言语失用症的一种诊断标志物:变异系数比。
Clin Linguist Phon. 2003 Oct-Nov;17(7):575-95. doi: 10.1080/0269920031000138141.
4
A diagnostic marker for childhood apraxia of speech: the lexical stress ratio.儿童言语失用症的一种诊断标志物:词汇重音比率。
Clin Linguist Phon. 2003 Oct-Nov;17(7):549-74. doi: 10.1080/0269920031000138123.
5
On the role of spectral transition for speech perception.
J Acoust Soc Am. 1986 Oct;80(4):1016-25. doi: 10.1121/1.393842.