Asgari Meysam, Shafran Izhak, Bayestehtashk Alireza
Center for Spoken Language Understanding, OHSU, Portland, OR, USA.
SLT Workshop Spok Lang Technol. 2012 Dec;2012:438-442. doi: 10.1109/slt.2012.6424264. Epub 2013 Feb 1.
We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.
我们研究了从环境录音中检测日常对话中浊音段的方法。此类录音包含高度多样的背景噪声,使得收集用于估计特定噪声的隐马尔可夫模型(HMM)的代表性标记样本变得困难或不可行。流行的工具及其衍生工具通过计算归一化互相关来检测浊音段,但不幸的是,它对不同类型的噪声很敏感。利用浊音不仅具有周期性而且谐波丰富这一事实,我们采用谐波模型对浊音段进行建模,该模型最近受到了广泛关注。在先前的工作中,使用最大似然准则为每一帧独立估计模型参数。然而,由于谐波系数的分布取决于说话者的发音器官,我们使用最大准则更稳健地估计模型参数。我们将根据谐波模型计算出的浊音似然性用作HMM的观测概率,并使用这种无监督的HMM来检测语音。谐波模型的一个问题是它们无法将语音与其他平稳谐波噪声区分开来。我们利用语音的非平稳特性来纠正这一弱点。我们在一个大型日常语音语料库上的语音检测任务中对我们的模型进行了实证评估,并证明这些模型的性能明显优于流行工具中使用的标准语音检测算法。