School of Public Health, Hangzhou Normal University, Hangzhou, China.
Department of Mathematics and Computer Science, Fujian Provincial Key Laboratory of Data-Intensive Computing, Quanzhou Normal University, Quanzhou, China.
Front Public Health. 2022 Apr 13;10:772592. doi: 10.3389/fpubh.2022.772592. eCollection 2022.
Alzheimer's disease (AD) is a neurodegenerative disease that is difficult to be detected using convenient and reliable methods. The language change in patients with AD is an important signal of their cognitive status, which potentially helps in early diagnosis. In this study, we developed a transfer learning model based on speech and natural language processing (NLP) technology for the early diagnosis of AD. The lack of large datasets limits the use of complex neural network models without feature engineering, while transfer learning can effectively solve this problem. The transfer learning model is firstly pre-trained on large text datasets to get the pre-trained language model, and then, based on such a model, an AD classification model is performed on small training sets. Concretely, a distilled bidirectional encoder representation (distilBert) embedding, combined with a logistic regression classifier, is used to distinguish AD from normal controls. The model experiment was evaluated on Alzheimer's dementia recognition through spontaneous speech datasets in 2020, including the balanced 78 healthy controls (HC) and 78 patients with AD. The accuracy of the proposed model is 0.88, which is almost equivalent to the champion score in the challenge and a considerable improvement over the baseline of 75% established by organizers of the challenge. As a result, the transfer learning method in this study improves AD prediction, which does not only reduces the need for feature engineering but also addresses the lack of sufficiently large datasets.
阿尔茨海默病(AD)是一种神经退行性疾病,目前尚无方便、可靠的检测方法。AD 患者的语言变化是其认知状态的重要信号,有助于早期诊断。本研究基于语音和自然语言处理(NLP)技术,开发了一种用于 AD 早期诊断的迁移学习模型。缺乏大型数据集限制了无特征工程的复杂神经网络模型的使用,而迁移学习可以有效地解决这个问题。迁移学习模型首先在大型文本数据集上进行预训练,以获得预训练语言模型,然后基于该模型在小型训练集上进行 AD 分类模型。具体来说,使用蒸馏的双向编码器表示(distilBert)嵌入,结合逻辑回归分类器,来区分 AD 和正常对照。在 2020 年基于自发语音数据集的阿尔茨海默病识别的模型实验中进行了评估,包括 78 名平衡的健康对照(HC)和 78 名 AD 患者。所提出模型的准确率为 0.88,几乎等同于挑战赛的冠军得分,明显优于挑战赛组织者建立的 75%的基线。因此,本研究中的迁移学习方法提高了 AD 预测的准确率,不仅减少了对特征工程的需求,还解决了数据集不足的问题。