Liu Jiamin, Fu Fan, Li Liang, Yu Junxiao, Zhong Dacheng, Zhu Songsheng, Zhou Yuxuan, Liu Bin, Li Jianqing
Jiangsu Province Engineering Research Center of Smart Wearable and Rehabilitation Devices, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China.
The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing 211166, China.
Brain Sci. 2023 Mar 11;13(3):477. doi: 10.3390/brainsci13030477.
Clinical studies have shown that speech pauses can reflect the cognitive function differences between Alzheimer's Disease (AD) and non-AD patients, while the value of pause information in AD detection has not been fully explored. Herein, we propose a speech pause feature extraction and encoding strategy for only acoustic-signal-based AD detection. First, a voice activity detection (VAD) method was constructed to detect pause/non-pause feature and encode it to binary pause sequences that are easier to calculate. Then, an ensemble machine-learning-based approach was proposed for the classification of AD from the participants' spontaneous speech, based on the VAD Pause feature sequence and common acoustic feature sets (ComParE and eGeMAPS). The proposed pause feature sequence was verified in five machine-learning models. The validation data included two public challenge datasets (ADReSS and ADReSSo, English voice) and a local dataset (10 audio recordings containing five patients and five controls, Chinese voice). Results showed that the VAD Pause feature was more effective than common feature sets (ComParE: 6373 features and eGeMAPS: 88 features) for AD classification, and that the ensemble method improved the accuracy by more than 5% compared to several baseline methods (8% on the ADReSS dataset; 5.9% on the ADReSSo dataset). Moreover, the pause-sequence-based AD detection method could achieve 80% accuracy on the local dataset. Our study further demonstrated the potential of pause information in speech-based AD detection, and also contributed to a more accessible and general pause feature extraction and encoding method for AD detection.
临床研究表明,言语停顿能够反映阿尔茨海默病(AD)患者与非AD患者之间的认知功能差异,然而停顿信息在AD检测中的价值尚未得到充分探索。在此,我们提出一种仅基于声学信号的AD检测的言语停顿特征提取与编码策略。首先,构建一种语音活动检测(VAD)方法来检测停顿/非停顿特征,并将其编码为更易于计算的二进制停顿序列。然后,基于VAD停顿特征序列和常见声学特征集(ComParE和eGeMAPS),提出一种基于集成机器学习的方法,用于从参与者的自发语音中对AD进行分类。所提出的停顿特征序列在五个机器学习模型中得到验证。验证数据包括两个公开挑战数据集(ADReSS和ADReSSo,英语语音)以及一个本地数据集(10个音频记录,包含5名患者和5名对照,中文语音)。结果表明,VAD停顿特征在AD分类方面比常见特征集(ComParE:6373个特征和eGeMAPS:88个特征)更有效,并且与几种基线方法相比,集成方法将准确率提高了5%以上(在ADReSS数据集上提高了8%;在ADReSSo数据集上提高了5.9%)。此外,基于停顿序列的AD检测方法在本地数据集上可达到80%的准确率。我们的研究进一步证明了停顿信息在基于语音的AD检测中的潜力,也为AD检测贡献了一种更易获取且通用的停顿特征提取与编码方法。