• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AVbook,一个用于研究多模态言语感知的高帧率叙事视听语音语料库。

AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception.

机构信息

Department of Bioengineering and Centre for Neurotechnology, Imperial College London, London, United Kingdom.

Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.

出版信息

J Acoust Soc Am. 2023 May 1;153(5):3130. doi: 10.1121/10.0019460.

DOI:10.1121/10.0019460
PMID:37249407
Abstract

Seeing a speaker's face can help substantially with understanding their speech, particularly in challenging listening conditions. Research into the neurobiological mechanisms behind audiovisual integration has recently begun to employ continuous natural speech. However, these efforts are impeded by a lack of high-quality audiovisual recordings of a speaker narrating a longer text. Here, we seek to close this gap by developing AVbook, an audiovisual speech corpus designed for cognitive neuroscience studies and audiovisual speech recognition. The corpus consists of 3.6 h of audiovisual recordings of two speakers, one male and one female, each reading 59 passages from a narrative English text. The recordings were acquired at a high frame rate of 119.88 frames/s. The corpus includes phone-level alignment files and a set of multiple-choice questions to test attention to the different passages. We verified the efficacy of these questions in a pilot study. A short written summary is also provided for each recording. To enable audiovisual synchronization when presenting the stimuli, four videos of an electronic clapperboard were recorded with the corpus. The corpus is publicly available to support research into the neurobiology of audiovisual speech processing as well as the development of computer algorithms for audiovisual speech recognition.

摘要

观看说话者的面部可以极大地帮助理解他们的讲话,特别是在具有挑战性的聆听环境中。最近,对视听整合背后的神经生物学机制的研究开始采用连续的自然语音。然而,这些努力受到缺乏高质量的说话者讲述较长文本的视听录音的阻碍。在这里,我们通过开发 AVbook 来弥补这一差距,AVbook 是一个专为认知神经科学研究和视听语音识别设计的视听语音语料库。该语料库包含两名说话者(一男一女)的 3.6 小时视听录音,每位说话者朗读 59 段叙事英语文本。录音以 119.88 帧/秒的高帧率获取。语料库包括音素级别的对齐文件和一组多项选择题,以测试对不同段落的注意力。我们在一项试点研究中验证了这些问题的有效性。每个录音还提供了简短的书面摘要。为了在呈现刺激时实现视听同步,我们用该语料库录制了四个电子响板的视频。该语料库可供公众使用,以支持视听语音处理的神经生物学研究以及用于视听语音识别的计算机算法的开发。

相似文献

1
AVbook, a high-frame-rate corpus of narrative audiovisual speech for investigating multimodal speech perception.AVbook,一个用于研究多模态言语感知的高帧率叙事视听语音语料库。
J Acoust Soc Am. 2023 May 1;153(5):3130. doi: 10.1121/10.0019460.
2
Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions.在无噪声条件下,匹配的视觉语音增强了皮质对连续听觉语音的同步化。
J Neurosci. 2015 Oct 21;35(42):14195-204. doi: 10.1523/JNEUROSCI.1829-15.2015.
3
Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.现在眼睛能“听清”了:自然视听言语处理中的反向有效性依赖于长期跨模态时间整合。
J Neurosci. 2016 Sep 21;36(38):9888-95. doi: 10.1523/JNEUROSCI.1396-16.2016.
4
Increasing audiovisual speech integration in autism through enhanced attention to mouth.通过增强对嘴巴的注意力,增加自闭症患者的视听言语整合。
Dev Sci. 2023 Jul;26(4):e13348. doi: 10.1111/desc.13348. Epub 2022 Dec 1.
5
Face-viewing patterns predict audiovisual speech integration in autistic children.面部观察模式可预测自闭症儿童的视听言语整合。
Autism Res. 2021 Dec;14(12):2592-2602. doi: 10.1002/aur.2598. Epub 2021 Aug 20.
6
Crossmodal and incremental perception of audiovisual cues to emotional speech.对情感语音视听线索的跨模态和递增感知。
Lang Speech. 2010;53(Pt 1):3-30. doi: 10.1177/0023830909348993.
7
Audiovisual speech segmentation in post-stroke aphasia: a pilot study.脑卒中后失语症的视听言语分割:一项初步研究。
Top Stroke Rehabil. 2019 Dec;26(8):588-594. doi: 10.1080/10749357.2019.1643566. Epub 2019 Aug 1.
8
Prediction and constraint in audiovisual speech perception.视听言语感知中的预测与约束
Cortex. 2015 Jul;68:169-81. doi: 10.1016/j.cortex.2015.03.006. Epub 2015 Mar 20.
9
Left Motor δ Oscillations Reflect Asynchrony Detection in Multisensory Speech Perception.左运动 δ 振荡反映多感觉语音感知中的异步检测。
J Neurosci. 2022 Mar 16;42(11):2313-2326. doi: 10.1523/JNEUROSCI.2965-20.2022. Epub 2022 Jan 27.
10
Congruent audiovisual speech enhances auditory attention decoding with EEG.视听语音一致增强了 EEG 对听觉注意力的解码。
J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.