Suppr超能文献

用于从口语和环境声音中识别创伤活动的多模态注意力网络。

Multimodal Attention Network for Trauma Activity Recognition from Spoken Language and Environmental Sound.

作者信息

Gu Yue, Zhang Ruiyu, Zhao Xinwei, Chen Shuhong, Abdulbaqi Jalal, Marsic Ivan, Cheng Megan, Burd Randall S

机构信息

Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.

Trauma and Burn Surgery, Childrens National Medical Center, Washington, DC, USA.

出版信息

Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ichi.2019.8904713. Epub 2019 Nov 21.

Abstract

Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during a trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 72.4% accuracy with 0.705 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.

摘要

创伤活动识别旨在检测、识别和预测创伤复苏过程中的活动(或任务)。先前的工作主要集中在使用包括图像、射频识别和生命体征在内的各种传感器数据来生成创伤事件日志。然而,包含创伤团队协作所需丰富沟通和上下文信息的口语和环境声音在很大程度上仍被忽视。在本文中,我们提出了一种多模态注意力网络(MAN),它将口头记录和环境音频流都用作输入;该模型使用多级多头注意力模块提取文本和声学特征,并形成用于创伤活动分类的最终共享表示。我们在从一家医院收集的75个实际创伤复苏病例上评估了所提出的架构。我们实现了72.4%的准确率和0.705的F1分数,表明我们提出的架构是有用且高效的。这些结果还表明,与先前的方法相比,使用口语和环境音频确实有助于识别难以识别 的活动。我们还对所提出的多模态注意力网络的性能和泛化进行了详细分析。

相似文献

2
Speech-Based Activity Recognition for Trauma Resuscitation.基于语音的创伤复苏活动识别
Proc (IEEE Int Conf Healthc Inform). 2020 Nov-Dec;2020. doi: 10.1109/ichi48887.2020.9374372. Epub 2021 Mar 12.
3
Language-Based Process Phase Detection in the Trauma Resuscitation.创伤复苏中基于语言的过程阶段检测
Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:239-247. doi: 10.1109/ICHI.2017.50. Epub 2017 Sep 14.
6
DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE.用于口语情感识别的深度多模态学习
Proc IEEE Int Conf Acoust Speech Signal Process. 2018 Apr;2018:5079-5083. doi: 10.1109/ICASSP.2018.8462440. Epub 2018 Sep 13.
8
Speech Intention Classification with Multimodal Deep Learning.基于多模态深度学习的语音意图分类
Adv Artif Intell. 2017 May;10233:260-271. doi: 10.1007/978-3-319-57351-9_30. Epub 2017 Apr 11.
9
MLNet: a multi-level multimodal named entity recognition architecture.MLNet:一种多层次多模态命名实体识别架构。
Front Neurorobot. 2023 Jun 20;17:1181143. doi: 10.3389/fnbot.2023.1181143. eCollection 2023.

引用本文的文献

3
Multi-dimensional task recognition for human-robot teaming: literature review.人机协作中的多维度任务识别:文献综述
Front Robot AI. 2023 Aug 7;10:1123374. doi: 10.3389/frobt.2023.1123374. eCollection 2023.
4
Video-based Concurrent Activity Recognition for Trauma Resuscitation.用于创伤复苏的基于视频的并发活动识别
Proc (IEEE Int Conf Healthc Inform). 2020 Nov-Dec;2020. doi: 10.1109/ichi48887.2020.9374399. Epub 2021 Mar 12.
5
Speech-Based Activity Recognition for Trauma Resuscitation.基于语音的创伤复苏活动识别
Proc (IEEE Int Conf Healthc Inform). 2020 Nov-Dec;2020. doi: 10.1109/ichi48887.2020.9374372. Epub 2021 Mar 12.

本文引用的文献

2
Speech Intention Classification with Multimodal Deep Learning.基于多模态深度学习的语音意图分类
Adv Artif Intell. 2017 May;10233:260-271. doi: 10.1007/978-3-319-57351-9_30. Epub 2017 Apr 11.
5
Deep Learning for RFID-Based Activity Recognition.基于射频识别的活动识别的深度学习
Proc Int Conf Embed Netw Sens Syst. 2016 Nov;2016:164-175. doi: 10.1145/2994551.2994569.
7
Language-Based Process Phase Detection in the Trauma Resuscitation.创伤复苏中基于语言的过程阶段检测
Proc (IEEE Int Conf Healthc Inform). 2017 Aug;2017:239-247. doi: 10.1109/ICHI.2017.50. Epub 2017 Sep 14.
8
Online Process Phase Detection Using Multimodal Deep Learning.基于多模态深度学习的在线过程阶段检测
Ubiquitous Comput Electron Mob Commun Conf (UEMCON) IEEE Annu. 2016 Oct;2016. doi: 10.1109/UEMCON.2016.7777912. Epub 2016 Dec 12.
9
Statistical modeling and recognition of surgical workflow.手术流程的统计建模与识别。
Med Image Anal. 2012 Apr;16(3):632-41. doi: 10.1016/j.media.2010.10.001. Epub 2010 Dec 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验