Gu Ying, Ying Jie, Chen Quan, Yang Hui, Wu Jingnan, Chen Nan, Li Yiming
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093 China.
School of Medical Devices, Shanghai University of Medicine & Health Sciences, Shanghai, 201318 China.
Biomed Eng Lett. 2024 Nov 28;15(1):261-272. doi: 10.1007/s13534-024-00444-6. eCollection 2025 Jan.
Alzheimer's disease (AD) is a neurodegenerative disorder with an irreversible progression. Currently, it is diagnosed using invasive and costly methods, such as cerebrospinal fluid analysis, neuroimaging, and neuropsychological assessments. Recent studies indicate that certain changes in language ability can predict early cognitive decline, highlighting the potential of speech analysis in AD recognition. Based on this premise, this study proposes an AD recognition multi-channel network framework, which is referred to as the ADNet. It integrates both time-domain and frequency-domain features of speech signals, using waveform images and log-Mel spectrograms derived from raw speech as data sources. The framework employs inverted residual blocks to enhance the learning of low-level time-domain features and uses gated multi-information units to effectively combine local and global frequency-domain features. The study tests it on a dataset from the Shanghai cognitive screening (SCS) digital neuropsychological assessment. The results show that the method we proposed outperforms existing speech-based methods, achieving an accuracy of 88.57%, a precision of 88.67%, and a recall of 88.64%. This study demonstrates that the proposed framework can effectively distinguish between the AD and normal controls, and it may be useful for developing early recognition tools for AD.
阿尔茨海默病(AD)是一种具有不可逆进展的神经退行性疾病。目前,其诊断采用侵入性且昂贵的方法,如脑脊液分析、神经影像学和神经心理学评估。近期研究表明,语言能力的某些变化可预测早期认知衰退,这凸显了语音分析在AD识别中的潜力。基于此前提,本研究提出了一种AD识别多通道网络框架,称为ADNet。它整合了语音信号的时域和频域特征,将源自原始语音的波形图像和对数梅尔频谱图作为数据源。该框架采用倒置残差块来增强对低级时域特征的学习,并使用门控多信息单元有效地结合局部和全局频域特征。本研究在来自上海认知筛查(SCS)数字神经心理学评估的数据集上对其进行了测试。结果表明,我们提出的方法优于现有的基于语音的方法,准确率达到88.57%,精确率为88.67%,召回率为88.64%。本研究表明,所提出的框架能够有效区分AD患者和正常对照,可能有助于开发AD的早期识别工具。