• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用复合音素的异音语音识别。

Heterophonic speech recognition using composite phones.

作者信息

Alkhairy Ashraf, Jafri Afshan

机构信息

King Abdul Aziz City for Science and Technology, Riyadh, Saudi Arabia.

King Saud University, Riyadh, Saudi Arabia.

出版信息

Springerplus. 2016 Nov 24;5(1):2008. doi: 10.1186/s40064-016-3332-9. eCollection 2016.

DOI:10.1186/s40064-016-3332-9
PMID:27933264
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5121111/
Abstract

Heterophones pose challenges during training of automatic speech recognition (ASR) systems because they involve ambiguity in the pronunciation of an orthographic representation of a word. Heterophones are words that have the same spelling but different pronunciations. This paper addresses the problem of heterophonic languages by developing the concept of a Composite Phoneme (CP) as a basic pronunciation unit for speech recognition. A CP is a set of alternative sequences of phonemes. CP's are developed specifically in the context of Arabic by defining phonetic units that are consonant centric and absorb phonemically contrastive short vowels and gemination, not represented in the Arabic Modern Orthography (MO). CPs alleviate the need to diacritize MO into Classical Orthography (CO), to represent short vowels and stress, before generating pronunciation in terms of Simple Phonemes (SP). We develop algorithms to generate CP pronunciation from MO, and SP pronunciation from CO to map a word into a single pronunciation. We investigate the performance of CP, SP, UG (Undiacritized Grapheme), and DG (Diacritized Grapheme) ASRs. The experimental results suggest that UG and DG are inferior to SP and CP. For the A-SpeechDB corpus with MO vocabulary of 8000, the WER for bigram and context dependent phone are: 11.78, 12.64, and 13.59 % for CP, SP_M (SP from manual diacritized CO), and SP_A (SP from automated diacritized MO) respectively. For vocabulary of 24,000 MO words, the corresponding WER's are 13.69, 15.08, and 16.86 %. For uniform statistical model, SP has a lower WER than CP. For context independent phone (CI), CP has lower WER than SP.

摘要

同音异形异义词在自动语音识别(ASR)系统的训练过程中带来了挑战,因为它们在单词的拼字表示发音方面存在歧义。同音异形异义词是指拼写相同但发音不同的单词。本文通过提出复合音素(CP)的概念作为语音识别的基本发音单元,来解决同音异形语言的问题。复合音素是一组可供选择的音素序列。复合音素是专门在阿拉伯语语境中开发的,通过定义以辅音为中心的语音单元,这些单元吸收了在阿拉伯语现代正字法(MO)中未体现的音位对比短元音和双写,复合音素减少了在根据简单音素(SP)生成发音之前,将现代正字法标注为古典正字法(CO)以表示短元音和重音的需求。我们开发了从现代正字法生成复合音素发音以及从古典正字法生成简单音素发音的算法,以便将一个单词映射到单一发音。我们研究了复合音素、简单音素、未标注正字法(UG)和标注正字法(DG)的自动语音识别性能。实验结果表明,未标注正字法和标注正字法不如简单音素和复合音素。对于拥有8000个现代正字法词汇的A-SpeechDB语料库,二元语法和上下文相关音素的词错误率(WER)分别为:复合音素为11.78%、手动标注古典正字法的简单音素(SP_M)为12.64%、自动标注现代正字法的简单音素(SP_A)为13.59%。对于24000个现代正字法单词的词汇表,相应的词错误率分别为13.69%、15.08%和16.86%。对于统一统计模型,简单音素的词错误率低于复合音素。对于上下文无关音素(CI),复合音素的词错误率低于简单音素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5441/5121111/adb4c60d72e1/40064_2016_3332_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5441/5121111/937b78f028fd/40064_2016_3332_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5441/5121111/adb4c60d72e1/40064_2016_3332_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5441/5121111/937b78f028fd/40064_2016_3332_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5441/5121111/adb4c60d72e1/40064_2016_3332_Fig2_HTML.jpg

相似文献

1
Heterophonic speech recognition using composite phones.使用复合音素的异音语音识别。
Springerplus. 2016 Nov 24;5(1):2008. doi: 10.1186/s40064-016-3332-9. eCollection 2016.
2
Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.用于诊断韩国儿童语音障碍发音的自动语音识别(ASR)
Clin Linguist Phon. 2024 Aug 20:1-14. doi: 10.1080/02699206.2024.2387609.
3
The special role of rimes in the description, use, and acquisition of English orthography.韵脚在英语拼写的描述、使用及习得方面的特殊作用。
J Exp Psychol Gen. 1995 Jun;124(2):107-36. doi: 10.1037//0096-3445.124.2.107.
4
The effect of orthography on the recognition of pronunciation variants.正字法对读音变体识别的影响。
J Exp Psychol Learn Mem Cogn. 2020 Jun;46(6):1121-1145. doi: 10.1037/xlm0000781. Epub 2019 Oct 24.
5
Effective Phoneme Decoding With Hyperbolic Neural Networks for High-Performance Speech BCIs.基于双曲神经网络的高效语音音素解码在高性能语音脑机接口中的应用。
IEEE Trans Neural Syst Rehabil Eng. 2024;32:3432-3441. doi: 10.1109/TNSRE.2024.3457313. Epub 2024 Sep 18.
6
Cluster-Based Pairwise Contrastive Loss for Noise-Robust Speech Recognition.用于抗噪声语音识别的基于聚类的成对对比损失
Sensors (Basel). 2024 Apr 17;24(8):2573. doi: 10.3390/s24082573.
7
Spelling pronunciation and visual preview both facilitate learning to spell irregular words.拼写发音和视觉预览都有助于学习拼写不规则单词。
Ann Dyslexia. 2006 Dec;56(2):301-18. doi: 10.1007/s11881-006-0013-3.
8
Dynamic time warping in phoneme modeling for fast pronunciation error detection.基于动态时间规整的音素建模在快速发音错误检测中的应用。
Comput Biol Med. 2016 Feb 1;69:277-85. doi: 10.1016/j.compbiomed.2015.12.004. Epub 2015 Dec 17.
9
Nonword pronunciation and models of word recognition.非词发音与单词识别模型
J Exp Psychol Hum Percept Perform. 1994 Dec;20(6):1177-96. doi: 10.1037//0096-1523.20.6.1177.
10
Orthographic and phonemic coding for lexical access: evidence from Hebrew.
J Exp Psychol Learn Mem Cogn. 1984 Jul;10(3):353-68. doi: 10.1037//0278-7393.10.3.353.