Clarke Natasha, Foltz Peter, Garrard Peter
Neurosciences Research Centre, Molecular & Clinical Sciences Research Institute, St George's, University of London, Cranmer Terrace, London, UK.
Institute of Cognitive Science, University of Colorado, Boulder, USA.
Cortex. 2020 Aug;129:446-463. doi: 10.1016/j.cortex.2020.05.001. Epub 2020 May 19.
Natural Language Processing (NLP) is an ever-growing field of computational science that aims to model natural human language. Combined with advances in machine learning, which learns patterns in data, it offers practical capabilities including automated language analysis. These approaches have garnered interest from clinical researchers seeking to understand the breakdown of language due to pathological changes in the brain, offering fast, replicable and objective methods. The study of Alzheimer's disease (AD), and preclinical Mild Cognitive Impairment (MCI), suggests that changes in discourse (connected speech or writing) may be key to early detection of disease. There is currently no disease-modifying treatment for AD, the leading cause of dementia in people over the age of 65, but detection of those at risk of developing the disease could help with the identification and testing of medications which can take effect before the underlying pathology has irreversibly spread. We outline important components of natural language, as well as NLP tools and approaches with which they can be extracted, analysed and used for disease identification and risk prediction. We review literature using these tools to model discourse across the spectrum of AD, including the contribution of machine learning approaches and Automatic Speech Recognition (ASR). We conclude that NLP and machine learning techniques are starting to greatly enhance research in the field, with measurable and quantifiable language components showing promise for early detection of disease, but there remain research and practical challenges for clinical implementation of these approaches. Challenges discussed include the availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability.
自然语言处理(NLP)是计算科学中一个不断发展的领域,旨在对自然人类语言进行建模。结合机器学习(从数据中学习模式)的进展,它提供了包括自动语言分析在内的实用功能。这些方法引起了临床研究人员的兴趣,他们试图了解由于大脑病理变化导致的语言障碍,提供了快速、可重复且客观的方法。对阿尔茨海默病(AD)和临床前轻度认知障碍(MCI)的研究表明,语篇(连贯的言语或写作)的变化可能是疾病早期检测的关键。目前尚无针对AD的疾病修饰治疗方法,AD是65岁以上人群痴呆症的主要病因,但检测有患该病风险的人有助于识别和测试在潜在病理不可逆扩散之前就能起效的药物。我们概述了自然语言的重要组成部分,以及可用于提取、分析和用于疾病识别及风险预测的NLP工具和方法。我们回顾了使用这些工具对AD全谱中的语篇进行建模的文献,包括机器学习方法和自动语音识别(ASR)的贡献。我们得出结论,NLP和机器学习技术开始极大地加强该领域的研究,可测量和量化的语言成分显示出疾病早期检测的潜力,但这些方法的临床应用仍存在研究和实际挑战。讨论的挑战包括大型多样数据集的可用性、数据收集和共享的伦理、诊断特异性和临床可接受性。