多模态语音数据的定量分析

Quantitative Analysis of Multimodal Speech Data.

作者信息

Gordon Danner Samantha, Vilela Barbosa Adriano, Goldstein Louis

机构信息

University of Southern California, Los Angeles, CA, 90089, USA.

Federal University of Minas Gerais, Belo Horizonte, MG, CEP 31270-901, Brazil.

出版信息

J Phon. 2018 Nov;71:268-283. doi: 10.1016/j.wocn.2018.09.007. Epub 2018 Oct 19.

DOI:10.1016/j.wocn.2018.09.007

PMID:30618477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6319935/

Abstract

This study presents techniques for quantitatively analyzing coordination and kinematics in multimodal speech using video, audio and electromagnetic articulography (EMA) data. Multimodal speech research has flourished due to recent improvements in technology, yet gesture detection/annotation strategies vary widely, leading to difficulty in generalizing across studies and in advancing this field of research. We describe how FlowAnalyzer software can be used to extract kinematic signals from basic video recordings; and we apply a technique, derived from speech kinematic research, to detect bodily gestures in these kinematic signals. We investigate whether kinematic characteristics of multimodal speech differ dependent on communicative context, and we find that these contexts be distinguished quantitatively, suggesting a way to improve and standardize existing gesture identification/annotation strategy. We also discuss a method, Correlation Map Analysis (CMA), for quantifying the relationship between speech and bodily gesture kinematics over time. We describe potential applications of CMA to multimodal speech research, such as describing characteristics of speech-gesture coordination in different communicative contexts. The use of the techniques presented here can improve and advance multimodal speech and gesture research by applying quantitative methods in the detection and description of multimodal speech.

摘要

本研究介绍了利用视频、音频和电磁关节造影（EMA）数据对多模态语音中的协调性和运动学进行定量分析的技术。由于近期技术的进步，多模态语音研究蓬勃发展，但手势检测/标注策略差异很大，导致跨研究的概括以及该研究领域的推进都存在困难。我们描述了如何使用FlowAnalyzer软件从基本视频记录中提取运动学信号；并且我们应用一种源自语音运动学研究的技术，来检测这些运动学信号中的身体手势。我们研究多模态语音的运动学特征是否因交流语境而异，并且我们发现这些语境可以通过定量方式加以区分，这为改进和规范现有的手势识别/标注策略提供了一种方法。我们还讨论了一种用于量化语音和身体手势运动学随时间变化关系的方法，即相关映射分析（CMA）。我们描述了CMA在多模态语音研究中的潜在应用，例如描述不同交流语境中语音 - 手势协调的特征。通过在多模态语音的检测和描述中应用定量方法，此处介绍的技术的使用可以改进和推进多模态语音和手势研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/66c5/6319935/77adbc23c141/nihms-1510425-f0001.jpg

相似文献

Quantitative Analysis of Multimodal Speech Data.多模态语音数据的定量分析

J Phon. 2018 Nov;71:268-283. doi: 10.1016/j.wocn.2018.09.007. Epub 2018 Oct 19.

The quantification of gesture-speech synchrony: A tutorial and validation of multimodal data acquisition using device-based and video-based motion tracking.手势-语音同步的量化：基于设备和视频运动跟踪的多模态数据采集的教程和验证。

Behav Res Methods. 2020 Apr;52(2):723-740. doi: 10.3758/s13428-019-01271-9.

Quantifying the speech-gesture relation with massive multimodal datasets: Informativity in time expressions.用大规模多模态数据集量化言语-手势关系：时间表达的信息量。

PLoS One. 2020 Jun 2;15(6):e0233892. doi: 10.1371/journal.pone.0233892. eCollection 2020.

Gesture-speech coupling in persons with aphasia: A kinematic-acoustic analysis.言语-手势耦合在失语症患者中的研究：运动学-声学分析。

J Exp Psychol Gen. 2023 May;152(5):1469-1483. doi: 10.1037/xge0001346. Epub 2023 Apr 13.

Pantomime (Not Silent Gesture) in Multimodal Communication: Evidence From Children's Narratives.多模态交流中的哑剧表演（非无声手势）：来自儿童叙事的证据

Front Psychol. 2020 Nov 27;11:575952. doi: 10.3389/fpsyg.2020.575952. eCollection 2020.

A multi-scale investigation of the human communication system's response to visual disruption.对人类通信系统对视觉干扰反应的多尺度研究。

R Soc Open Sci. 2022 Apr 13;9(4):211489. doi: 10.1098/rsos.211489. eCollection 2022 Apr.

The Production of Gesture and Speech by People With Aphasia: Influence of Communicative Constraints.言语和手势的产生与失语症患者：交际限制的影响。

J Speech Lang Hear Res. 2019 Nov 11;62(12):4417-4432. doi: 10.1044/2019_JSLHR-L-19-0020. Print 2019 Dec 18.

How do minimally verbal children and adolescents with autism spectrum disorder use communicative gestures to complement their spoken language abilities?患有自闭症谱系障碍的极少使用言语的儿童和青少年如何运用交流手势来补充他们的口语能力？

Autism Dev Lang Impair. 2021 Jan-Dec;6. doi: 10.1177/23969415211035065. Epub 2021 Aug 4.

Differences in the production and perception of communicative kinematics in autism.自闭症患者在沟通运动学的产生和感知方面的差异。

Autism Res. 2021 Dec;14(12):2640-2653. doi: 10.1002/aur.2611. Epub 2021 Sep 18.

Hand Gestures Have Predictive Potential During Conversation: An Investigation of the Timing of Gestures in Relation to Speech.手势在对话中具有预测潜力：对与言语相关的手势时间的研究。

Cogn Sci. 2024 Jan;48(1):e13407. doi: 10.1111/cogs.13407.

引用本文的文献

Intrapersonal Behavioral Coordination and Expressive Accuracy During First Impressions.初次印象形成过程中的人际行为协调与表达准确性

Soc Psychol Personal Sci. 2022 Jan;13(1):150-159. doi: 10.1177/19485506211011317. Epub 2021 Apr 28.

Gesture-speech physics in fluent speech and rhythmic upper limb movements.流畅言语和有节奏的上肢运动中的手势-言语物理学

Ann N Y Acad Sci. 2021 May;1491(1):89-105. doi: 10.1111/nyas.14532. Epub 2020 Dec 18.

The Role of Temporal Modulation in Sensorimotor Interaction.时间调制在感觉运动交互中的作用。

Front Psychol. 2019 Dec 6;10:2608. doi: 10.3389/fpsyg.2019.02608. eCollection 2019.

Behav Res Methods. 2020 Apr;52(2):723-740. doi: 10.3758/s13428-019-01271-9.

Entrainment and Modulation of Gesture-Speech Synchrony Under Delayed Auditory Feedback.延迟听觉反馈下的手势-言语同步的诱发和调制。

Cogn Sci. 2019 Mar;43(3):e12721. doi: 10.1111/cogs.12721.

本文引用的文献

A Kinematic Study of Prosodic Structure in Articulatory and Manual Gestures: Results from a Novel Method of Data Collection.发音和手势中韵律结构的运动学研究：一种新型数据收集方法的结果

Lab Phonol. 2017;8(1). doi: 10.5334/labphon.75. Epub 2017 Mar 13.

Gesture for Linguists: A Handy Primer.语言学家的手势：实用入门指南。

Lang Linguist Compass. 2015 Nov 1;9(11):437-451. doi: 10.1111/lnc3.12168.

Motion Tracker: Camera-Based Monitoring of Bodily Movements Using Motion Silhouettes.运动追踪器：基于摄像头利用运动轮廓对身体运动进行监测

PLoS One. 2015 Jun 18;10(6):e0130293. doi: 10.1371/journal.pone.0130293. eCollection 2015.

Hearing and seeing meaning in speech and gesture: insights from brain and behaviour.从大脑与行为中洞察言语和手势中的听觉与视觉意义

Philos Trans R Soc Lond B Biol Sci. 2014 Sep 19;369(1651):20130296. doi: 10.1098/rstb.2013.0296.

Spatiotemporal coupling between speech and manual motor actions.言语与手动运动动作之间的时空耦合。

J Phon. 2014 Jan;42:1-11. doi: 10.1016/j.wocn.2013.11.002.

Evaluation of prosodic juncture strength using functional data analysis.使用功能数据分析评估韵律连接强度。

J Phon. 2013 Nov;41(8). doi: 10.1016/j.wocn.2013.08.001.

Frame-differencing methods for measuring bodily synchrony in conversation.用于测量会话中身体协调性的帧差方法。

Behav Res Methods. 2013 Jun;45(2):329-43. doi: 10.3758/s13428-012-0249-2.

Behavior matching in multimodal communication is synchronized.多模态交流中的行为匹配是同步的。

Cogn Sci. 2012 Nov-Dec;36(8):1404-26. doi: 10.1111/j.1551-6709.2012.01269.x. Epub 2012 Sep 17.

Quantifying time-varying coordination of multimodal speech signals using correlation map analysis.使用相关图分析量化多模态语音信号的时变协调。

J Acoust Soc Am. 2012 Mar;131(3):2162-72. doi: 10.1121/1.3682040.

The speech focus position effect on jaw-finger coordination in a pointing task.在指向任务中言语焦点位置对颌-指协调的影响。

J Speech Lang Hear Res. 2008 Dec;51(6):1507-21. doi: 10.1044/1092-4388(2008/07-0173). Epub 2008 Aug 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验