Suppr超能文献

自发对话中唇部动作的研究及其在语音活动检测中的应用。

A study of lip movements during spontaneous dialog and its application to voice activity detection.

作者信息

Sodoyer David, Rivet Bertrand, Girin Laurent, Savariaux Christophe, Schwartz Jean-Luc, Jutten Christian

机构信息

Department of Speech and Cognition, GIPSA-lab, UMR 5126 CNRS, Grenoble-INP, Université Stendhal, Université Joseph Fourier, Grenoble, France.

出版信息

J Acoust Soc Am. 2009 Feb;125(2):1184-96. doi: 10.1121/1.3050257.

Abstract

This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., speech+nonspeech audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.

摘要

本文对特定说话者在不同语音/非语音情境下的唇部动作进行了定量且全面的研究,尤其关注沉默时段(即说话者不发出声音时)。目的是刻画“唇部活动”与“语音活动”之间的关系,进而将视觉语音信息用作语音活动检测器(VAD)。为此,录制了一个原始的视听语料库,两名说话者虽身处不同房间,但进行面对面的自然对话。每位说话者通过麦克风、摄像头、屏幕和耳机与对方交流。该系统用于为每位说话者捕捉单独的音频刺激,并同步监测说话者的唇部动作。对沉默或非沉默(即语音+非语音可听事件)情况下的唇形和唇部动作进行了全面分析。一个用于刻画唇部动作的单一视觉参数被证明对检测沉默时段有效。这产生了一种视觉VAD,可用于任何类型的环境噪声,包括复杂且高度非平稳的噪声,例如多个和/或移动的噪声源或竞争语音信号。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验