针对重症监护患者的两阶段可视语音识别。

Two-stage visual speech recognition for intensive care patients.

机构信息

Department of Intensive Care and Intermediate Care, University Hospital RWTH Aachen, Pauwelsstreet 30, 52072, Aachen, Germany.

Research Area Information Theory and Systematic Design of Communication Systems, RWTH Aachen University, Kopernikusstreet 16, 52074, Aachen, Germany.

出版信息

Sci Rep. 2023 Jan 17;13(1):928. doi: 10.1038/s41598-022-26155-5.

DOI:10.1038/s41598-022-26155-5

PMID:36650188

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9844948/

Abstract

In this work, we propose a framework to enhance the communication abilities of speech-impaired patients in an intensive care setting via reading lips. Medical procedure, such as a tracheotomy, causes the patient to lose the ability to utter speech with little to no impact on the habitual lip movement. Consequently, we developed a framework to predict the silently spoken text by performing visual speech recognition, i.e., lip-reading. In a two-stage architecture, frames of the patient's face are used to infer audio features as an intermediate prediction target, which are then used to predict the uttered text. To the best of our knowledge, this is the first approach to bring visual speech recognition into an intensive care setting. For this purpose, we recorded an audio-visual dataset in the University Hospital of Aachen's intensive care unit (ICU) with a language corpus hand-picked by experienced clinicians to be representative of their day-to-day routine. With a word error rate of 6.3%, the trained system reaches a sufficient overall performance to significantly increase the quality of communication between patient and clinician or relatives.

摘要

在这项工作中，我们提出了一个框架，通过读取嘴唇来增强重症监护环境中言语障碍患者的沟通能力。医疗程序，如气管切开术，会导致患者失去说话的能力，但对习惯性嘴唇运动的影响很小。因此，我们开发了一个通过执行视觉语音识别（即唇读）来预测无声说话文本的框架。在两级架构中，使用患者面部的帧来推断音频特征作为中间预测目标，然后使用这些特征来预测说出的文本。据我们所知，这是将视觉语音识别引入重症监护环境的首次尝试。为此，我们在亚琛大学医院的重症监护病房（ICU）记录了一个视听数据集，该数据集的语言语料库是由经验丰富的临床医生精心挑选的，以代表他们的日常工作。经过训练的系统的单词错误率为 6.3%，达到了足够的整体性能，可以显著提高患者与临床医生或家属之间的沟通质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/922c/9845353/7d6acd6a9a5f/41598_2022_26155_Fig1_HTML.jpg

相似文献

Two-stage visual speech recognition for intensive care patients.

Sci Rep. 2023 Jan 17;13(1):928. doi: 10.1038/s41598-022-26155-5.

Deep Audio-Visual Speech Recognition.

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8717-8727. doi: 10.1109/TPAMI.2018.2889052. Epub 2022 Nov 7.

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading.

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2772-2782. doi: 10.1109/TNNLS.2022.3191677. Epub 2024 Feb 5.

Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

Neuropsychologia. 2014 Dec;65:1-11. doi: 10.1016/j.neuropsychologia.2014.09.031. Epub 2014 Oct 2.

Seeing to hear better: evidence for early audio-visual interactions in speech identification.

Cognition. 2004 Sep;93(2):B69-78. doi: 10.1016/j.cognition.2004.01.006.

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.

Neural Netw. 2021 Nov;143:171-182. doi: 10.1016/j.neunet.2021.06.003. Epub 2021 Jun 8.

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy.

Sensors (Basel). 2023 Feb 11;23(4):2053. doi: 10.3390/s23042053.

Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech.

J Neurosci. 2020 Jan 29;40(5):1053-1065. doi: 10.1523/JNEUROSCI.1101-19.2019. Epub 2019 Dec 30.

Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition.

Sensors (Basel). 2021 Dec 23;22(1):72. doi: 10.3390/s22010072.

Audio-visual speech perception in schizophrenia: an fMRI study.

Psychiatry Res. 2001 Feb 28;106(1):1-14. doi: 10.1016/s0925-4927(00)00081-0.

本文引用的文献

Tracheostomy in patients with COVID-19: predictors and clinical features.

Eur Arch Otorhinolaryngol. 2021 Oct;278(10):3911-3919. doi: 10.1007/s00405-020-06555-x. Epub 2021 Jan 1.

Mechanical ventilation and mortality among 223 critically ill patients with coronavirus disease 2019: A multicentric study in Germany.

Aust Crit Care. 2021 Mar;34(2):167-175. doi: 10.1016/j.aucc.2020.10.009. Epub 2020 Oct 27.

Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study.

BMJ. 2020 May 22;369:m1985. doi: 10.1136/bmj.m1985.

Impact of mechanical ventilation on the daily costs of ICU care: a systematic review and meta regression.

Epidemiol Infect. 2019 Dec 5;147:e314. doi: 10.1017/S0950268819001900.

Strategies for communicating with conscious mechanically ventilated critically ill patients.

Proc (Bayl Univ Med Cent). 2019 Jul 22;32(4):534-537. doi: 10.1080/08998280.2019.1635413. eCollection 2019 Oct.

Ten-year trends in intensive care admissions for respiratory infections in the elderly.

Ann Intensive Care. 2018 Aug 15;8(1):84. doi: 10.1186/s13613-018-0430-6.

Delirium After Mechanical Ventilation in Intensive Care Units: The Cognitive and Psychosocial Assessment (CAPA) Study Protocol.

JMIR Res Protoc. 2017 Feb 28;6(2):e31. doi: 10.2196/resprot.6660.

The number of mechanically ventilated ICU patients meeting communication criteria.

Heart Lung. 2015 Jan-Feb;44(1):45-9. doi: 10.1016/j.hrtlng.2014.08.010. Epub 2014 Sep 26.

Nurse-patient communication interactions in the intensive care unit.

Am J Crit Care. 2011 Mar;20(2):e28-40. doi: 10.4037/ajcc2011433.

Critical care and the global burden of critical illness in adults.

Lancet. 2010 Oct 16;376(9749):1339-46. doi: 10.1016/S0140-6736(10)60446-1. Epub 2010 Oct 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

针对重症监护患者的两阶段可视语音识别。

Two-stage visual speech recognition for intensive care patients.

机构信息

Department of Intensive Care and Intermediate Care, University Hospital RWTH Aachen, Pauwelsstreet 30, 52072, Aachen, Germany.

Research Area Information Theory and Systematic Design of Communication Systems, RWTH Aachen University, Kopernikusstreet 16, 52074, Aachen, Germany.

出版信息

Sci Rep. 2023 Jan 17;13(1):928. doi: 10.1038/s41598-022-26155-5.

DOI:10.1038/s41598-022-26155-5

PMID:36650188

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9844948/

Abstract

摘要

针对重症监护患者的两阶段可视语音识别。

Two-stage visual speech recognition for intensive care patients.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

针对重症监护患者的两阶段可视语音识别。

Two-stage visual speech recognition for intensive care patients.

机构信息

出版信息

相似文献

本文引用的文献