Ríos-Urrego Cristian David, Escobar-Grisales Daniel, Orozco-Arroyave Juan Rafael
GITA Lab., Faculty of Engineering, University of Antioquia, Medellín 050010, Colombia.
LME Lab., University of Erlangen, 91054 Erlangen, Germany.
Diagnostics (Basel). 2024 Dec 31;15(1):73. doi: 10.3390/diagnostics15010073.
BACKGROUND/OBJECTIVES: Parkinson's disease (PD) affects more than 6 million people worldwide. Its accurate diagnosis and monitoring are key factors to reduce its economic burden. Typical approaches consider either speech signals or video recordings of the face to automatically model abnormal patterns in PD patients.
This paper introduces, for the first time, a new methodology that performs the synchronous fusion of information extracted from speech recordings and their corresponding videos of lip movement, namely the bimodal approach.
Our results indicate that the introduced method is more accurate and suitable than unimodal approaches or classical asynchronous approaches that combine both sources of information but do not incorporate the underlying temporal information.
This study demonstrates that using a synchronous fusion strategy with concatenated projections based on attention mechanisms, i.e., speech-to-lips and lips-to-speech, exceeds previous results reported in the literature. Complementary information between lip movement and speech production is confirmed when advanced fusion strategies are employed. Finally, multimodal approaches, combining visual and speech signals, showed great potential to improve PD classification, generating more confident and robust models for clinical diagnostic support.
背景/目的:帕金森病(PD)在全球影响着超过600万人。其准确诊断和监测是减轻其经济负担的关键因素。典型方法要么考虑语音信号,要么考虑面部视频记录,以自动对帕金森病患者的异常模式进行建模。
本文首次介绍了一种新方法,该方法对从语音记录及其相应的唇部运动视频中提取的信息进行同步融合,即双峰方法。
我们的结果表明,与单峰方法或结合两种信息源但未纳入潜在时间信息的经典异步方法相比,所介绍的方法更准确且更适用。
本研究表明,使用基于注意力机制的串联投影的同步融合策略,即语音到唇部和唇部到语音,超过了文献中报道的先前结果。当采用先进的融合策略时,唇部运动和语音产生之间的互补信息得到了证实。最后,结合视觉和语音信号的多模态方法在改善帕金森病分类方面显示出巨大潜力,为临床诊断支持生成更可靠和稳健的模型。