Abdulbaqi Jalal, Gu Yue, Xu Zhichao, Gao Chenyang, Marsic Ivan, Burd Randall S
Department of Electrical and Computer Engineering Rutgers, The State University of New Jersey Piscataway, NJ, USA.
Trauma and Burn Surgery Children's National Medical Center Washington, DC, USA.
Proc (IEEE Int Conf Healthc Inform). 2020 Nov-Dec;2020. doi: 10.1109/ichi48887.2020.9374372. Epub 2021 Mar 12.
We present a speech-based approach to recognize team activities in the context of trauma resuscitation. We first analyzed the audio recordings of trauma resuscitations in terms of activity frequency, noise-level, and activity-related keyword frequency to determine the dataset characteristics. We next evaluated different audio-preprocessing parameters (spectral feature types and audio channels) to find the optimal configuration. We then introduced a novel neural network to recognize the trauma activities using a modified VGG network that extracts features from the audio input. The output of the modified VGG network is combined with the output of a network that takes keyword text as input, and the combination is used to generate activity labels. We compared our system with several baselines and performed a detailed analysis of the performance results for specific activities. Our results show that our proposed architecture that uses Mel-spectrum spectral coefficients features with a stereo channel and activity-specific frequent keywords achieve the highest accuracy and average F1-score.
我们提出了一种基于语音的方法来识别创伤复苏背景下的团队活动。我们首先从活动频率、噪声水平和与活动相关的关键词频率方面分析了创伤复苏的音频记录,以确定数据集的特征。接下来,我们评估了不同的音频预处理参数(频谱特征类型和音频通道),以找到最佳配置。然后,我们引入了一种新颖的神经网络,使用经过修改的VGG网络从音频输入中提取特征来识别创伤活动。修改后的VGG网络的输出与以关键词文本作为输入的网络的输出相结合,并利用该组合生成活动标签。我们将我们的系统与几个基线进行了比较,并对特定活动的性能结果进行了详细分析。我们的结果表明,我们提出的架构使用具有立体声通道和特定活动频繁关键词的梅尔频谱系数特征,实现了最高的准确率和平均F1分数。