Suppr超能文献

用于语速估计的凸加权标准。

Convex weighting criteria for speaking rate estimation.

作者信息

Jiao Yishan, Berisha Visar, Tu Ming, Liss Julie

机构信息

Department of Speech and Hearing Science, Arizona State University.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.

Abstract

Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.

摘要

直接从语音波形估计语速是语音信号处理中一个长期存在的问题。在本文中,我们将语速估计问题视为估计一个时间密度函数的问题,该函数在给定区间上的积分得出该区间内的语速。与许多现有方法不同,我们避免了在语音信号中检测单个音素这一更为困难的任务,并且我们避免了诸如对时间包络进行阈值处理以估计元音数量等启发式方法。相反,所提出的方法旨在学习一个最优加权函数,该函数可以直接应用于语音信号中的时频特征以产生一个时间密度函数。我们提出了两个用于学习加权函数的凸代价函数以及一种适应策略,以使用最少的训练将该方法定制到特定的说话者。这些算法在TIMIT语料库、构音障碍语音语料库和ICSI交换机自发语音语料库上进行了评估。结果表明,所提出的方法在正常语音和构音障碍语音上均优于三种竞争方法。此外,对于自发语音速率估计,结果表明估计的语速与真实值之间具有高度相关性。

相似文献

1
Convex weighting criteria for speaking rate estimation.
IEEE/ACM Trans Audio Speech Lang Process. 2015 Sep;23(9):1421-1430. doi: 10.1109/TASLP.2015.2434213.
2
Selecting Disorder-Specific Features for Speech Pathology Fingerprinting.
Proc IEEE Int Conf Acoust Speech Signal Process. 2013:7562-7566. doi: 10.1109/ICASSP.2013.6639133.
3
Robust Speech Rate Estimation for Spontaneous Speech.
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
6
Phone duration modeling for speaker age estimation in children.
J Acoust Soc Am. 2022 Nov;152(5):3000. doi: 10.1121/10.0015198.
7
Speaker-Independent Phoneme Alignment Using Transition-Dependent States.
Speech Commun. 2009 Apr;51(4):352-368. doi: 10.1016/j.specom.2008.11.003.
8
Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition.
IEEE Trans Neural Syst Rehabil Eng. 2017 Sep;25(9):1581-1591. doi: 10.1109/TNSRE.2017.2681691. Epub 2017 Mar 13.
9
Communication efficiency of dysarthric speakers as measured by sentence intelligibility and speaking rate.
J Speech Hear Disord. 1981 Aug;46(3):296-301. doi: 10.1044/jshd.4603.296.

引用本文的文献

1
Augmented words to improve a deep learning-based Indonesian syllabification.
Heliyon. 2021 Oct 5;7(10):e08115. doi: 10.1016/j.heliyon.2021.e08115. eCollection 2021 Oct.
2
Altered speech with migraine attacks: A prospective, longitudinal study of episodic migraine without aura.
Cephalalgia. 2019 May;39(6):722-731. doi: 10.1177/0333102418815505. Epub 2018 Nov 17.

本文引用的文献

1
Rhythm as a coordinating device: entrainment with disordered speech.
J Speech Lang Hear Res. 2014 Jun 1;57(3):815-24. doi: 10.1044/2014_JSLHR-S-13-0149.
2
Discriminating dysarthria type from envelope modulation spectra.
J Speech Lang Hear Res. 2010 Oct;53(5):1246-55. doi: 10.1044/1092-4388(2010/09-0121). Epub 2010 Jul 19.
3
Robust Speech Rate Estimation for Spontaneous Speech.
IEEE Trans Audio Speech Lang Process. 2007 Nov 1;15(8):2190-2201. doi: 10.1109/TASL.2007.905178.
4
Quantifying speech rhythm abnormalities in the dysarthrias.
J Speech Lang Hear Res. 2009 Oct;52(5):1334-52. doi: 10.1044/1092-4388(2009/08-0208). Epub 2009 Aug 28.
5
Praat script to detect syllable nuclei and measure speech rate automatically.
Behav Res Methods. 2009 May;41(2):385-90. doi: 10.3758/BRM.41.2.385.
6
An overview of statistical learning theory.
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
7
Dysarthria associated with traumatic brain injury: speaking rate and emphatic stress.
J Commun Disord. 2005 May-Jun;38(3):231-60. doi: 10.1016/j.jcomdis.2004.12.001.
9
Speaking rate and speech movement velocity profiles.
J Speech Hear Res. 1993 Feb;36(1):41-54. doi: 10.1044/jshr.3601.41.
10
Effect of speaking rate on the perceptual structure of a phonetic category.
Percept Psychophys. 1989 Dec;46(6):505-12. doi: 10.3758/bf03208147.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验