Suppr超能文献

使用深度学习方法对语音产生和嘴唇运动进行同步分析以检测帕金森病

Synchronous Analysis of Speech Production and Lips Movement to Detect Parkinson's Disease Using Deep Learning Methods.

作者信息

Ríos-Urrego Cristian David, Escobar-Grisales Daniel, Orozco-Arroyave Juan Rafael

机构信息

GITA Lab., Faculty of Engineering, University of Antioquia, Medellín 050010, Colombia.

LME Lab., University of Erlangen, 91054 Erlangen, Germany.

出版信息

Diagnostics (Basel). 2024 Dec 31;15(1):73. doi: 10.3390/diagnostics15010073.

Abstract

BACKGROUND/OBJECTIVES: Parkinson's disease (PD) affects more than 6 million people worldwide. Its accurate diagnosis and monitoring are key factors to reduce its economic burden. Typical approaches consider either speech signals or video recordings of the face to automatically model abnormal patterns in PD patients.

METHODS

This paper introduces, for the first time, a new methodology that performs the synchronous fusion of information extracted from speech recordings and their corresponding videos of lip movement, namely the bimodal approach.

RESULTS

Our results indicate that the introduced method is more accurate and suitable than unimodal approaches or classical asynchronous approaches that combine both sources of information but do not incorporate the underlying temporal information.

CONCLUSIONS

This study demonstrates that using a synchronous fusion strategy with concatenated projections based on attention mechanisms, i.e., speech-to-lips and lips-to-speech, exceeds previous results reported in the literature. Complementary information between lip movement and speech production is confirmed when advanced fusion strategies are employed. Finally, multimodal approaches, combining visual and speech signals, showed great potential to improve PD classification, generating more confident and robust models for clinical diagnostic support.

摘要

背景/目的:帕金森病(PD)在全球影响着超过600万人。其准确诊断和监测是减轻其经济负担的关键因素。典型方法要么考虑语音信号,要么考虑面部视频记录,以自动对帕金森病患者的异常模式进行建模。

方法

本文首次介绍了一种新方法,该方法对从语音记录及其相应的唇部运动视频中提取的信息进行同步融合,即双峰方法。

结果

我们的结果表明,与单峰方法或结合两种信息源但未纳入潜在时间信息的经典异步方法相比,所介绍的方法更准确且更适用。

结论

本研究表明,使用基于注意力机制的串联投影的同步融合策略,即语音到唇部和唇部到语音,超过了文献中报道的先前结果。当采用先进的融合策略时,唇部运动和语音产生之间的互补信息得到了证实。最后,结合视觉和语音信号的多模态方法在改善帕金森病分类方面显示出巨大潜力,为临床诊断支持生成更可靠和稳健的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb42/11720596/586879973fa2/diagnostics-15-00073-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验