• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation.梅森-阿尔伯塔音标分段器:一种基于深度神经网络和插值的强制对齐系统。
Phonetica. 2024 Sep 5;81(5):451-508. doi: 10.1515/phon-2024-0015. Print 2024 Oct 28.
2
Examining Factors Influencing the Viability of Automatic Acoustic Analysis of Child Speech.探究影响儿童语音自动声学分析可行性的因素。
J Speech Lang Hear Res. 2018 Oct 26;61(10):2487-2501. doi: 10.1044/2018_JSLHR-S-17-0275.
3
Performance of Forced-Alignment Algorithms on Children's Speech.强制对齐算法在儿童语音上的性能
J Speech Lang Hear Res. 2021 Jun 18;64(6S):2213-2222. doi: 10.1044/2020_JSLHR-20-00268. Epub 2021 Mar 11.
4
Automatic alignment for New Englishes: Applying state-of-the-art aligners to Trinidadian English.新英语的自动对齐:将最先进的对齐工具应用于特立尼达英语。
J Acoust Soc Am. 2020 Apr;147(4):2283. doi: 10.1121/10.0001069.
5
Using automatic alignment to analyze endangered language data: testing the viability of untrained alignment.使用自动对齐分析濒危语言数据:测试未训练对齐的可行性。
J Acoust Soc Am. 2013 Sep;134(3):2235-46. doi: 10.1121/1.4816491.
6
Automatic measurement of vowel duration via structured prediction.通过结构化预测自动测量元音时长。
J Acoust Soc Am. 2016 Dec;140(6):4517. doi: 10.1121/1.4972527.
7
Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription.提高文本无关强制对齐以支持言语语言病理学家进行音标转写。
Sensors (Basel). 2023 Dec 6;23(24):9650. doi: 10.3390/s23249650.
8
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
9
Errors on a Speech-in-Babble Sentence Recognition Test Reveal Individual Differences in Acoustic Phonetic Perception and Babble Misallocations.嘈杂语音句子识别测试中的错误揭示了声学语音感知和嘈杂语音误分配方面的个体差异。
Ear Hear. 2021 May/Jun;42(3):673-690. doi: 10.1097/AUD.0000000000001020.
10
Emotional speech acoustic model for Malay: iterative versus isolated unit training.马来语情感语音声学模型:迭代与孤立单元训练。
J Acoust Soc Am. 2013 Oct;134(4):3057-66. doi: 10.1121/1.4818741.

本文引用的文献

1
Documenting and modeling the acoustic variability of intervocalic alveolar taps in conversational Peninsular Spanish.记录和建模会话式半岛西班牙语中元音间 alveolar taps 的声学可变性。
J Acoust Soc Am. 2024 Jan 1;155(1):294-305. doi: 10.1121/10.0024345.
2
The myth of categorical perception.范畴知觉的神话。
J Acoust Soc Am. 2022 Dec;152(6):3819. doi: 10.1121/10.0016614.
3
Using acoustic distance and acoustic absement to quantify lexical competition.利用声学距离和吸声衰减来量化词汇竞争。
J Acoust Soc Am. 2022 Feb;151(2):1367. doi: 10.1121/10.0009584.
4
A New Acoustic-Based Pronunciation Distance Measure.一种基于声学的新型发音距离度量。
Front Artif Intell. 2020 May 29;3:39. doi: 10.3389/frai.2020.00039. eCollection 2020.
5
EARSHOT: A Minimal Neural Network Model of Incremental Human Speech Recognition.耳听为实:人类语音识别的增量式神经网络最小模型
Cogn Sci. 2020 Apr;44(4):e12823. doi: 10.1111/cogs.12823.
6
Assessing the importance of several acoustic properties to the perception of spontaneous speech.评估若干声学特性对感知自然语言的重要性。
J Acoust Soc Am. 2018 Apr;143(4):2255. doi: 10.1121/1.5031123.
7
TISK 1.0: An easy-to-use Python implementation of the time-invariant string kernel model of spoken word recognition.TISK 1.0:一个易于使用的 Python 实现,用于语音识别的时不变字符串核模型。
Behav Res Methods. 2018 Jun;50(3):871-889. doi: 10.3758/s13428-017-1012-5.
8
Modeling consonant-context effects in a large database of spontaneous speech recordings.
J Acoust Soc Am. 2017 Jul;142(1):434. doi: 10.1121/1.4991022.
9
Blind phone segmentation based on spectral change detection using Legendre polynomial approximation.
J Acoust Soc Am. 2015 Feb;137(2):797-805. doi: 10.1121/1.4906147.
10
Tracking perception of the sounds of English.追踪对英语语音的感知。
J Acoust Soc Am. 2014 May;135(5):2995-3006. doi: 10.1121/1.4870486.

梅森-阿尔伯塔音标分段器:一种基于深度神经网络和插值的强制对齐系统。

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation.

机构信息

Department of English, Linguistics Program, George Mason University 3298 , Fairfax, VA, USA.

Department of Linguistics, University of Alberta, Edmonton, AB, Canada.

出版信息

Phonetica. 2024 Sep 5;81(5):451-508. doi: 10.1515/phon-2024-0015. Print 2024 Oct 28.

DOI:10.1515/phon-2024-0015
PMID:39248125
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11449383/
Abstract

Given an orthographic transcription, forced alignment systems automatically determine boundaries between segments in speech, facilitating the use of large corpora. In the present paper, we introduce a neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). MAPS serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model as a tagger, rather than a classifier, motivated by the common understanding that segments are not truly discrete and often overlap. The second is an interpolation technique to allow more precise boundaries than the typical 10 ms limit in modern systems. During testing, all system configurations we trained significantly outperformed the state-of-the-art Montreal Forced Aligner in the 10 ms boundary placement tolerance threshold. The greatest difference achieved was a 28.13 % relative performance increase. The Montreal Forced Aligner began to slightly outperform our models at around a 30 ms tolerance. We also reflect on the training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians' conception of similarity between phones and that reconciling this tension may require rethinking the task and output targets or how speech itself should be segmented.

摘要

给定一个正字转录,强制对齐系统可以自动确定语音中的音段边界,从而方便使用大型语料库。在本文中,我们引入了一个基于神经网络的强制对齐系统,即 Mason-Alberta 音标分段器(MAPS)。MAPS 是我们为强制对齐系统探索的两种可能改进的测试平台。第一种改进是将声学模型视为标记器而不是分类器,这是基于这样一种共识,即音段并非真正离散,而且经常重叠。第二种改进是一种插值技术,可以比现代系统中典型的 10ms 限制允许更精确的边界。在测试中,我们训练的所有系统配置在 10ms 边界放置容限阈值方面都明显优于最先进的蒙特利尔强制对齐器。最大的差异是相对性能提高了 28.13%。在大约 30ms 的容限下,蒙特利尔强制对齐器开始略微优于我们的模型。我们还反思了强制对齐中声学建模的训练过程,强调了这些模型的输出目标与语音学家对音位相似性的概念不匹配,并且调和这种紧张关系可能需要重新思考任务和输出目标,或者如何分割语音本身。