Suppr超能文献

基于融合堆叠泛化的口语和书面语阿尔茨海默病预测。

Predicting Alzheimer's Disease from Spoken and Written Language Using Fusion-Based Stacked Generalization.

机构信息

School of Computer Science, Queensland University of Technology, Brisbane 4001, Australia; The Australian e-Health Research Centre, CSIRO, Brisbane 4029, Australia.

School of Computer Science, Queensland University of Technology, Brisbane 4001, Australia.

出版信息

J Biomed Inform. 2021 Jun;118:103803. doi: 10.1016/j.jbi.2021.103803. Epub 2021 May 19.

Abstract

The importance of automating the diagnosis of Alzheimer disease (AD) towards facilitating its early prediction has long been emphasized, hampered in part by lack of empirical support. Given the evident association of AD with age and the increasing aging population owing to the general well-being of individuals, there have been unprecedented estimated economic complications. Consequently, many recent studies have attempted to employ the language deficiency caused by cognitive decline in automating the diagnostic task via training machine learning (ML) algorithms with linguistic patterns and deficits. In this study, we aim to develop multiple heterogeneous stacked fusion models that harness the advantages of several base learning algorithms to improve the overall generalizability and robustness of AD diagnostic ML models, where we parallelly utilized two different written and spoken-based datasets to train our stacked fusion models. Further, we examined the effect of linking these two datasets to develop a hybrid stacked fusion model that can predict AD from written and spoken languages. Our feature spaces involved two widely used linguistic patterns: lexicosyntactics and character n-gram spaces. We firstly investigated lexicosyntactics of AD alongside healthy controls (HC), where we explored a few new lexicosyntactic features, then optimized the lexicosyntactic feature space by proposing a correlation feature selection technique that eliminates features based on their feature-feature inter-correlations and feature-target correlations according to a certain threshold. Our stacked fusion models establish benchmarks on both datasets with AUC of 98.1% and 99.47% for the spoken and written-based datasets, respectively, and corresponding accuracy and F1 score values around 95% on spoken-based dataset and around 97% on the written-based dataset. Likewise, the hybrid stacked fusion model on linked data presents an optimal performance with 99.2% AUC as well as accuracy and F1 score falling around 97%. In view of the achieved performance and enhanced generalizability of such fusion models over single classifiers, this study suggests replacing the initial traditional screening test with such models that can be embedded into an online format for a fully automated remote diagnosis.

摘要

自动化阿尔茨海默病(AD)诊断以促进其早期预测的重要性早已得到强调,但部分原因是缺乏经验支持。鉴于 AD 与年龄的明显关联以及由于个人普遍健康状况导致的人口老龄化增加,预计会出现前所未有的经济并发症。因此,许多最近的研究试图通过使用语言学习算法(ML)算法来训练语言模式和缺陷,从而实现认知能力下降的自动化诊断任务。在这项研究中,我们旨在开发多个异构堆叠融合模型,利用几种基础学习算法的优势,提高 AD 诊断 ML 模型的整体通用性和稳健性,我们并行使用两个不同的基于书面和口语的数据集来训练我们的堆叠融合模型。此外,我们还研究了将这两个数据集联系起来以开发混合堆叠融合模型的效果,该模型可以从书面和口语语言中预测 AD。我们的特征空间涉及两个广泛使用的语言模式:词汇语法和字符 n 元组空间。我们首先研究了 AD 与健康对照组(HC)的词汇语法,其中我们探索了一些新的词汇语法特征,然后通过提出一种基于特征-特征互相关和特征-目标相关的相关特征选择技术来优化词汇语法特征空间根据一定的阈值来消除特征。我们的堆叠融合模型在两个数据集上均建立了基准,基于口语的数据集的 AUC 为 98.1%,基于书面的数据集的 AUC 为 99.47%,基于口语的数据集的准确率和 F1 分数约为 95%,基于书面的数据集的准确率和 F1 分数约为 97%。同样,链接数据的混合堆叠融合模型具有最佳性能,AUC 为 99.2%,准确率和 F1 分数约为 97%。鉴于此类融合模型在单个分类器之上的表现和增强的通用性,本研究建议用此类模型代替初始传统筛选测试,这些模型可以嵌入到在线格式中,用于完全自动化的远程诊断。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验