College of Information Sciences and Technology, The Pennsylvania State University, University Park, Pennsylvania 16803, USA.
College of Information Sciences and Technology, The Pennsylvania State University, University Park, Pennsylvania 16803, USA.
Med Image Anal. 2022 Aug;80:102522. doi: 10.1016/j.media.2022.102522. Epub 2022 Jun 25.
In an emergency room (ER) setting, stroke triage or screening is a common challenge. A quick CT is usually done instead of MRI due to MRI's slow throughput and high cost. Clinical tests are commonly referred to during the process, but the misdiagnosis rate remains high. We propose a novel multimodal deep learning framework, DeepStroke, to achieve computer-aided stroke presence assessment by recognizing patterns of minor facial muscles incoordination and speech inability for patients with suspicion of stroke in an acute setting. Our proposed DeepStroke takes one-minute facial video data and audio data readily available during stroke triage for local facial paralysis detection and global speech disorder analysis. Transfer learning was adopted to reduce face-attribute biases and improve generalizability. We leverage a multi-modal lateral fusion to combine the low- and high-level features and provide mutual regularization for joint training. Novel adversarial training is introduced to obtain identity-free and stroke-discriminative features. Experiments on our video-audio dataset with actual ER patients show that DeepStroke outperforms state-of-the-art models and achieves better performance than both a triage team and ER doctors, attaining a 10.94% higher sensitivity and maintaining 7.37% higher accuracy than traditional stroke triage when specificity is aligned. Meanwhile, each assessment can be completed in less than six minutes, demonstrating the framework's great potential for clinical translation.
在急诊室(ER)环境中,脑卒中分诊或筛查是一个常见的挑战。由于 MRI 的吞吐量慢且成本高,通常会进行快速 CT 而不是 MRI。在这个过程中通常会引用临床测试,但误诊率仍然很高。我们提出了一种新颖的多模态深度学习框架 DeepStroke,通过识别急性脑卒中患者面部小肌肉不协调和言语障碍的模式,实现计算机辅助脑卒中存在评估。我们提出的 DeepStroke 采用一分钟的面部视频数据和音频数据,可在脑卒中分诊期间快速获取,用于局部面瘫检测和全局言语障碍分析。迁移学习用于减少面部属性偏差并提高通用性。我们利用多模态横向融合来组合低水平和高水平特征,并为联合训练提供相互正则化。引入新的对抗训练来获得无身份和脑卒中鉴别特征。在我们的实际 ER 患者视频-音频数据集上的实验表明,DeepStroke 优于最先进的模型,并且比分诊团队和急诊医生的表现都要好,在特异性一致的情况下,敏感性提高了 10.94%,准确性提高了 7.37%。同时,每次评估可以在不到六分钟的时间内完成,证明了该框架在临床转化方面具有巨大的潜力。