• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视听语音的调制传递函数。

Modulation transfer functions for audiovisual speech.

机构信息

Hearing Systems, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.

Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark.

出版信息

PLoS Comput Biol. 2022 Jul 19;18(7):e1010273. doi: 10.1371/journal.pcbi.1010273. eCollection 2022 Jul.

DOI:10.1371/journal.pcbi.1010273
PMID:35852989
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9295967/
Abstract

Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.

摘要

面部运动和声学调制之间的时间同步是视听语音的一个显著特征。众所周知,在自然语音中,运动的脸和嘴与低频声包络波动(低于 10 Hz)相关,但包络信息与面部不同部位运动同步的确切速率尚不清楚。在这里,我们使用正则化典型相关分析(rCCA)来学习语音包络滤波器,其输出与说话者面部不同部位的运动相关。我们利用基于视频的 3D 面部地标估计的最新进展,允许我们检查大量说话者(约 4000 个)的面部运动和语音包络之间的统计相关性。具体来说,rCCA 用于学习语音包络的调制传递函数(MTF),这些函数显著预测了不同说话者之间与面部运动的相关性。视听分析揭示了在不同时间尺度上具有带通的语音包络滤波器。第一组 MTF 显示出约 3-4 Hz 的峰值,与口部运动相关。第二组 MTF 捕获了与更全局的面部和头部运动相关的 1-2 Hz 范围内的包络波动。这两个独特的时间尺度仅作为许多说话者的自然视听统计数据的属性出现。对执行受控语音任务的较少说话者进行类似的分析,仅突出了与口面部运动相关的约 4 Hz 的已知时间调制。视听相关的不同带通范围与自然语音中音节(3-4 Hz)和短语(1-2 Hz)产生的平均速率显著对齐。虽然音节率的周期性在语音信号本身的包络谱中显而易见,但当考虑跨模态信号统计时,较慢的 1-2 Hz 规律性才变得明显。这可能表明自然语音中音节和短语时间规律的起源是运动的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/82be9f6fea33/pcbi.1010273.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/f7d64ddc38d1/pcbi.1010273.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/2e03f824fde6/pcbi.1010273.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/c8ea5bcc53d6/pcbi.1010273.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/82be9f6fea33/pcbi.1010273.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/f7d64ddc38d1/pcbi.1010273.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/2e03f824fde6/pcbi.1010273.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/c8ea5bcc53d6/pcbi.1010273.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/275c/9295967/82be9f6fea33/pcbi.1010273.g004.jpg

相似文献

1
Modulation transfer functions for audiovisual speech.视听语音的调制传递函数。
PLoS Comput Biol. 2022 Jul 19;18(7):e1010273. doi: 10.1371/journal.pcbi.1010273. eCollection 2022 Jul.
2
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope.言语音节韵律的声学关联:调制谱或时域包络的局部特征。
Neurosci Biobehav Rev. 2023 Apr;147:105111. doi: 10.1016/j.neubiorev.2023.105111. Epub 2023 Feb 22.
3
Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions.在无噪声条件下,匹配的视觉语音增强了皮质对连续听觉语音的同步化。
J Neurosci. 2015 Oct 21;35(42):14195-204. doi: 10.1523/JNEUROSCI.1829-15.2015.
4
The natural statistics of audiovisual speech.视听语音的自然统计学
PLoS Comput Biol. 2009 Jul;5(7):e1000436. doi: 10.1371/journal.pcbi.1000436. Epub 2009 Jul 17.
5
Predicted effects of sensorineural hearing loss on across-fiber envelope coding in the auditory nerve.预测感音神经性听力损失对听神经跨纤维包络编码的影响。
J Acoust Soc Am. 2011 Jun;129(6):4001-13. doi: 10.1121/1.3583502.
6
Effects of Visual Speech Envelope on Audiovisual Speech Perception in Multitalker Listening Environments.多说话人聆听环境下视觉语音包络对视听语音感知的影响。
J Speech Lang Hear Res. 2021 Jul 16;64(7):2845-2853. doi: 10.1044/2021_JSLHR-20-00688. Epub 2021 Jun 8.
7
Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.现在眼睛能“听清”了:自然视听言语处理中的反向有效性依赖于长期跨模态时间整合。
J Neurosci. 2016 Sep 21;36(38):9888-95. doi: 10.1523/JNEUROSCI.1396-16.2016.
8
Complex Mapping between Neural Response Frequency and Linguistic Units in Natural Speech.自然语音中神经反应频率与语言单位之间的复杂映射关系。
J Cogn Neurosci. 2023 Aug 1;35(8):1361-1368. doi: 10.1162/jocn_a_02013.
9
Human Frequency Following Responses to Vocoded Speech.人类对语音编码语音的频率跟随反应。
Ear Hear. 2017 Sep/Oct;38(5):e256-e267. doi: 10.1097/AUD.0000000000000432.
10
Effect of reducing slow temporal modulations on speech reception.降低缓慢时间调制对言语接收的影响。
J Acoust Soc Am. 1994 May;95(5 Pt 1):2670-80. doi: 10.1121/1.409836.

引用本文的文献

1
A Visual Speech Intelligibility Benefit Based on Speech Rhythm.基于语音节奏的视觉言语可懂度优势。
Brain Sci. 2023 Jun 8;13(6):932. doi: 10.3390/brainsci13060932.

本文引用的文献

1
Acoustically Driven Cortical δ Oscillations Underpin Prosodic Chunking.声驱动皮层 δ 振荡为韵律切分提供基础。
eNeuro. 2021 Jul 9;8(4). doi: 10.1523/ENEURO.0562-20.2021. Print 2021 Jul-Aug.
2
Auditory stimulus-response modeling with a match-mismatch task.听觉刺激-反应模型与匹配-不匹配任务。
J Neural Eng. 2021 May 4;18(4). doi: 10.1088/1741-2552/abf771.
3
Synchronous facial action binds dynamic facial features.同步的面部动作绑定动态的面部特征。
Sci Rep. 2021 Mar 30;11(1):7191. doi: 10.1038/s41598-021-86725-x.
4
The interrelationship between the face and vocal tract configuration during audiovisual speech.在视听言语中,面部和声道构形之间的相互关系。
Proc Natl Acad Sci U S A. 2020 Dec 22;117(51):32791-32798. doi: 10.1073/pnas.2006192117. Epub 2020 Dec 8.
5
Remote Heart Rate Estimation Based on 3D Facial Landmarks.基于3D面部标志点的远程心率估计
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:2634-2637. doi: 10.1109/EMBC44109.2020.9176563.
6
Sequences of Intonation Units form a ~ 1 Hz rhythm.语调单元序列形成了一个约 1Hz 的节奏。
Sci Rep. 2020 Sep 28;10(1):15846. doi: 10.1038/s41598-020-72739-4.
7
Theta Synchronization of Phonatory and Articulatory Systems in Marmoset Monkey Vocal Production.鸣禽和猴类发声系统的声门与声道协同作用的Theta 同步
Curr Biol. 2020 Nov 2;30(21):4276-4283.e3. doi: 10.1016/j.cub.2020.08.019. Epub 2020 Sep 3.
8
Evolution of the speech-ready brain: The voice/jaw connection in the human motor cortex.言语准备大脑的演变:人类运动皮层中的声音/下颌连接。
J Comp Neurol. 2021 Apr 1;529(5):1018-1028. doi: 10.1002/cne.24997. Epub 2020 Sep 8.
9
Acoustic information about upper limb movement in voicing.发声时上肢运动的声学信息。
Proc Natl Acad Sci U S A. 2020 May 26;117(21):11364-11367. doi: 10.1073/pnas.2004163117. Epub 2020 May 11.
10
Speech rhythms and their neural foundations.言语节奏及其神经基础。
Nat Rev Neurosci. 2020 Jun;21(6):322-334. doi: 10.1038/s41583-020-0304-4. Epub 2020 May 6.