• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自发对话中唇部动作的研究及其在语音活动检测中的应用。

A study of lip movements during spontaneous dialog and its application to voice activity detection.

作者信息

Sodoyer David, Rivet Bertrand, Girin Laurent, Savariaux Christophe, Schwartz Jean-Luc, Jutten Christian

机构信息

Department of Speech and Cognition, GIPSA-lab, UMR 5126 CNRS, Grenoble-INP, Université Stendhal, Université Joseph Fourier, Grenoble, France.

出版信息

J Acoust Soc Am. 2009 Feb;125(2):1184-96. doi: 10.1121/1.3050257.

DOI:10.1121/1.3050257
PMID:19206891
Abstract

This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector (VAD). To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence (i.e., speech+nonspeech audible events). A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals.

摘要

本文对特定说话者在不同语音/非语音情境下的唇部动作进行了定量且全面的研究,尤其关注沉默时段(即说话者不发出声音时)。目的是刻画“唇部活动”与“语音活动”之间的关系,进而将视觉语音信息用作语音活动检测器(VAD)。为此,录制了一个原始的视听语料库,两名说话者虽身处不同房间,但进行面对面的自然对话。每位说话者通过麦克风、摄像头、屏幕和耳机与对方交流。该系统用于为每位说话者捕捉单独的音频刺激,并同步监测说话者的唇部动作。对沉默或非沉默(即语音+非语音可听事件)情况下的唇形和唇部动作进行了全面分析。一个用于刻画唇部动作的单一视觉参数被证明对检测沉默时段有效。这产生了一种视觉VAD,可用于任何类型的环境噪声,包括复杂且高度非平稳的噪声,例如多个和/或移动的噪声源或竞争语音信号。

相似文献

1
A study of lip movements during spontaneous dialog and its application to voice activity detection.自发对话中唇部动作的研究及其在语音活动检测中的应用。
J Acoust Soc Am. 2009 Feb;125(2):1184-96. doi: 10.1121/1.3050257.
2
Seeing to hear better: evidence for early audio-visual interactions in speech identification.为听得更清而看:语音识别中早期视听交互的证据
Cognition. 2004 Sep;93(2):B69-78. doi: 10.1016/j.cognition.2004.01.006.
3
Visual influences on alignment to voice onset time.视觉对语音起始时间对准的影响。
J Speech Lang Hear Res. 2010 Apr;53(2):262-72. doi: 10.1044/1092-4388(2009/08-0247). Epub 2010 Mar 10.
4
Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech.唇读使大脑能够合成未知静音语音的听觉特征。
J Neurosci. 2020 Jan 29;40(5):1053-1065. doi: 10.1523/JNEUROSCI.1101-19.2019. Epub 2019 Dec 30.
5
Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.使用皮质带图的说话者归一化:一种用于稳态元音分类的神经模型。
J Acoust Soc Am. 2008 Dec;124(6):3918-36. doi: 10.1121/1.2997478.
6
A novel approach to study audiovisual integration in speech perception: localizer fMRI and sparse sampling.一种研究言语感知中视听整合的新方法:定位功能磁共振成像和稀疏采样。
Brain Res. 2008 Jul 18;1220:142-9. doi: 10.1016/j.brainres.2007.08.027. Epub 2007 Aug 19.
7
Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.视觉能力对仅通过听觉进行的语音识别很重要:来自自闭症谱系障碍的证据。
Neuropsychologia. 2014 Dec;65:1-11. doi: 10.1016/j.neuropsychologia.2014.09.031. Epub 2014 Oct 2.
8
SVD-based optimal filtering for noise reduction in dual microphone hearing aids: a real time implementation and perceptual evaluation.基于奇异值分解的双麦克风助听器降噪优化滤波:实时实现与感知评估
IEEE Trans Biomed Eng. 2005 Sep;52(9):1563-73. doi: 10.1109/TBME.2005.851517.
9
Speech interactions with linguistic, cognitive, and visuomotor tasks.与语言、认知和视觉运动任务的言语互动。
J Speech Lang Hear Res. 2005 Apr;48(2):295-305. doi: 10.1044/1092-4388(2005/020).
10
Perception of intersensory synchrony in audiovisual speech: not that special.视听语音中内感觉同步的感知:并非那么特殊。
Cognition. 2011 Jan;118(1):75-83. doi: 10.1016/j.cognition.2010.10.002. Epub 2010 Oct 29.

引用本文的文献

1
The natural statistics of audiovisual speech.视听语音的自然统计学
PLoS Comput Biol. 2009 Jul;5(7):e1000436. doi: 10.1371/journal.pcbi.1000436. Epub 2009 Jul 17.