Department of Medical Informatics, College of Medicine, The Catholic University of Korea, 222, Banpo-daero, Seocho-gu, Seoul, 06591, Republic of Korea.
Intellicode Corp., 105, Gwanggyo-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16229, Republic of Korea.
Comput Biol Med. 2024 Sep;180:108950. doi: 10.1016/j.compbiomed.2024.108950. Epub 2024 Aug 2.
Detecting and analyzing Alzheimer's disease (AD) in its early stages is a crucial and significant challenge. Speech data from AD patients can aid in diagnosing AD since the speech features have common patterns independent of race and spoken language. However, previous models for diagnosing AD from speech data have often focused on the characteristics of a single language, with no guarantee of scalability to other languages. In this study, we used the same method to extract acoustic features from two language datasets to diagnose AD.
Using the Korean and English speech datasets, we used ten models capable of real-time AD and healthy control classification, regardless of language type. Four machine learning models were based on hand-crafted features, while the remaining six deep learning models utilized non-explainable features.
The highest accuracy achieved by the machine learning models was 0.73 and 0.69 for the Korean and English speech datasets, respectively. The deep learning models' maximum achievable accuracy reached 0.75 and 0.78, with their minimum classification time of 0.01s and 0.02s. These findings reveal the models' robustness regardless of Korean and English and real-time diagnosis of AD through a 30-s voice sample.
Non-explainable deep learning models that directly acquire voice representations surpassed machine learning models utilizing hand-crafted features in AD diagnosis. In addition, these AI models could confirm the possibility of extending to a language-agnostic AD diagnosis.
早期检测和分析阿尔茨海默病(AD)是一项至关重要的挑战。AD 患者的语音数据可以帮助诊断 AD,因为语音特征具有独立于种族和口语的共同模式。然而,以前用于从语音数据诊断 AD 的模型通常侧重于单一语言的特征,无法保证可扩展到其他语言。在这项研究中,我们使用相同的方法从两种语言的数据集提取声学特征来诊断 AD。
使用韩语和英语语音数据集,我们使用了十种能够实时进行 AD 和健康对照组分类的模型,无论语言类型如何。四种机器学习模型基于手工制作的特征,而其余六个深度学习模型则利用不可解释的特征。
机器学习模型的最高准确率分别为韩语语音数据集的 0.73 和英语语音数据集的 0.69。深度学习模型的最大可达准确率达到 0.75 和 0.78,其最小分类时间为 0.01s 和 0.02s。这些发现表明,无论韩语和英语,这些模型都具有稳健性,并且可以通过 30 秒的语音样本进行实时 AD 诊断。
直接获取语音表示的不可解释深度学习模型在 AD 诊断中优于利用手工制作特征的机器学习模型。此外,这些人工智能模型可以确认扩展到语言无关的 AD 诊断的可能性。