Gu Yue, Zhang Ruiyu, Zhao Xinwei, Chen Shuhong, Abdulbaqi Jalal, Marsic Ivan, Cheng Megan, Burd Randall S
Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ, USA.
Trauma and Burn Surgery, Childrens National Medical Center, Washington, DC, USA.
Proc (IEEE Int Conf Healthc Inform). 2019 Jun;2019. doi: 10.1109/ichi.2019.8904713. Epub 2019 Nov 21.
Trauma activity recognition aims to detect, recognize, and predict the activities (or tasks) during a trauma resuscitation. Previous work has mainly focused on using various sensor data including image, RFID, and vital signals to generate the trauma event log. However, spoken language and environmental sound, which contain rich communication and contextual information necessary for trauma team cooperation, are still largely ignored. In this paper, we propose a multimodal attention network (MAN) that uses both verbal transcripts and environmental audio stream as input; the model extracts textual and acoustic features using a multi-level multi-head attention module, and forms a final shared representation for trauma activity classification. We evaluated the proposed architecture on 75 actual trauma resuscitation cases collected from a hospital. We achieved 72.4% accuracy with 0.705 F1 score, demonstrating that our proposed architecture is useful and efficient. These results also show that using spoken language and environmental audio indeed helps identify hard-to-recognize activities, compared to previous approaches. We also provide a detailed analysis of the performance and generalization of the proposed multimodal attention network.
创伤活动识别旨在检测、识别和预测创伤复苏过程中的活动(或任务)。先前的工作主要集中在使用包括图像、射频识别和生命体征在内的各种传感器数据来生成创伤事件日志。然而,包含创伤团队协作所需丰富沟通和上下文信息的口语和环境声音在很大程度上仍被忽视。在本文中,我们提出了一种多模态注意力网络(MAN),它将口头记录和环境音频流都用作输入;该模型使用多级多头注意力模块提取文本和声学特征,并形成用于创伤活动分类的最终共享表示。我们在从一家医院收集的75个实际创伤复苏病例上评估了所提出的架构。我们实现了72.4%的准确率和0.705的F1分数,表明我们提出的架构是有用且高效的。这些结果还表明,与先前的方法相比,使用口语和环境音频确实有助于识别难以识别 的活动。我们还对所提出的多模态注意力网络的性能和泛化进行了详细分析。