Santander-Cruz Yamanki, Salazar-Colores Sebastián, Paredes-García Wilfrido Jacobo, Guendulain-Arenas Humberto, Tovar-Arriaga Saúl
Facultad de Ingeniería, Universidad Autónoma de Querétaro, Queretaro C.P. 76010, Mexico.
Centro de Investigaciones en Óptica, Leon C.P. 37150, Mexico.
Brain Sci. 2022 Feb 15;12(2):270. doi: 10.3390/brainsci12020270.
Dementia is a neurodegenerative disease that leads to the development of cognitive deficits, such as aphasia, apraxia, and agnosia. It is currently considered one of the most significant major medical problems worldwide, primarily affecting the elderly. This condition gradually impairs the patient's cognition, eventually leading to the inability to perform everyday tasks without assistance. Since dementia is an incurable disease, early detection plays an important role in delaying its progression. Because of this, tools and methods have been developed to help accurately diagnose patients in their early stages. State-of-the-art methods have shown that the use of syntactic-type linguistic features provides a sensitive and noninvasive tool for detecting dementia in its early stages. However, these methods lack relevant semantic information. In this work, we propose a novel methodology, based on the semantic features approach, by using sentence embeddings computed by Siamese BERT networks (SBERT), along with support vector machine (SVM), K-nearest neighbors (KNN), random forest, and an artificial neural network (ANN) as classifiers. Our methodology extracted 17 features that provide demographic, lexical, syntactic, and semantic information from 550 oral production samples of elderly controls and people with Alzheimer's disease, provided by the DementiaBank Pitt Corpus database. To quantify the relevance of the extracted features for the dementia classification task, we calculated the mutual information score, which demonstrates a dependence between our features and the MMSE score. The experimental classification performance metrics, such as the accuracy, precision, recall, and F1 score (77, 80, 80, and 80%, respectively), validate that our methodology performs better than syntax-based methods and the BERT approach when only the linguistic features are used.
痴呆症是一种神经退行性疾病,会导致认知缺陷的发展,如失语症、失用症和失认症。它目前被认为是全球最重要的主要医学问题之一,主要影响老年人。这种疾病会逐渐损害患者的认知能力,最终导致在没有帮助的情况下无法完成日常任务。由于痴呆症是一种无法治愈的疾病,早期检测在延缓其进展方面起着重要作用。因此,已经开发出工具和方法来帮助在早期阶段准确诊断患者。最先进的方法表明,使用句法类型的语言特征为早期检测痴呆症提供了一种敏感且非侵入性的工具。然而,这些方法缺乏相关的语义信息。在这项工作中,我们提出了一种基于语义特征方法的新颖方法,通过使用由暹罗BERT网络(SBERT)计算的句子嵌入,以及支持向量机(SVM)、K近邻(KNN)、随机森林和人工神经网络(ANN)作为分类器。我们的方法从痴呆症银行匹兹堡语料库数据库提供的550个老年对照组和阿尔茨海默病患者的口语样本中提取了17个特征,这些特征提供了人口统计学、词汇、句法和语义信息。为了量化提取的特征与痴呆症分类任务的相关性,我们计算了互信息得分,这表明我们的特征与MMSE得分之间存在依赖关系。实验分类性能指标,如准确率、精确率、召回率和F1分数(分别为77%、80%、80%和80%),验证了我们的方法在仅使用语言特征时比基于句法的方法和BERT方法表现更好。