Li Xinyu, Zhang Yanyi, Li Mengzhu, Chen Shuhong, Austin Farneth R, Marsic Ivan, Burd Randall S
Rutgers University, Piscataway, NJ, USA.
Children's National Medical Center, Washington, DC, USA.
Ubiquitous Comput Electron Mob Commun Conf (UEMCON) IEEE Annu. 2016 Oct;2016. doi: 10.1109/UEMCON.2016.7777912. Epub 2016 Dec 12.
We present a multimodal deep-learning structure that automatically predicts phases of the trauma resuscitation process in real-time. The system first pre-processes the audio and video streams captured by a Kinect's built-in microphone array and depth sensor. A multimodal deep learning structure then extracts video and audio features, which are later combined through a "slow fusion" model. The final decision is then made from the combined features through a modified softmax classification layer. The model was trained on 20 trauma resuscitation cases (>13 hours), and was tested on 5 other cases. Our results showed over 80% online detection accuracy with 0.7 F-Score, outperforming previous systems.
我们提出了一种多模态深度学习结构,可实时自动预测创伤复苏过程的阶段。该系统首先对由Kinect的内置麦克风阵列和深度传感器捕获的音频和视频流进行预处理。然后,多模态深度学习结构提取视频和音频特征,随后通过“慢融合”模型将这些特征进行组合。最后,通过修改后的softmax分类层根据组合后的特征做出最终决策。该模型在20个创伤复苏病例(超过13小时)上进行了训练,并在其他5个病例上进行了测试。我们的结果显示在线检测准确率超过80%,F值为0.7,优于先前的系统。