Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, 72 E. Concord Street, Evans 636, Boston, MA, 02118, USA.
The Framingham Heart Study, Boston University, Boston, MA, 02118, USA.
Alzheimers Res Ther. 2021 Aug 31;13(1):146. doi: 10.1186/s13195-021-00888-3.
Identification of reliable, affordable, and easy-to-use strategies for detection of dementia is sorely needed. Digital technologies, such as individual voice recordings, offer an attractive modality to assess cognition but methods that could automatically analyze such data are not readily available.
We used 1264 voice recordings of neuropsychological examinations administered to participants from the Framingham Heart Study (FHS), a community-based longitudinal observational study. The recordings were 73 min in duration, on average, and contained at least two speakers (participant and examiner). Of the total voice recordings, 483 were of participants with normal cognition (NC), 451 recordings were of participants with mild cognitive impairment (MCI), and 330 were of participants with dementia (DE). We developed two deep learning models (a two-level long short-term memory (LSTM) network and a convolutional neural network (CNN)), which used the audio recordings to classify if the recording included a participant with only NC or only DE and to differentiate between recordings corresponding to those that had DE from those who did not have DE (i.e., NDE (NC+MCI)). Based on 5-fold cross-validation, the LSTM model achieved a mean (±std) area under the receiver operating characteristic curve (AUC) of 0.740 ± 0.017, mean balanced accuracy of 0.647 ± 0.027, and mean weighted F1 score of 0.596 ± 0.047 in classifying cases with DE from those with NC. The CNN model achieved a mean AUC of 0.805 ± 0.027, mean balanced accuracy of 0.743 ± 0.015, and mean weighted F1 score of 0.742 ± 0.033 in classifying cases with DE from those with NC. For the task related to the classification of participants with DE from NDE, the LSTM model achieved a mean AUC of 0.734 ± 0.014, mean balanced accuracy of 0.675 ± 0.013, and mean weighted F1 score of 0.671 ± 0.015. The CNN model achieved a mean AUC of 0.746 ± 0.021, mean balanced accuracy of 0.652 ± 0.020, and mean weighted F1 score of 0.635 ± 0.031 in classifying cases with DE from those who were NDE.
This proof-of-concept study demonstrates that automated deep learning-driven processing of audio recordings of neuropsychological testing performed on individuals recruited within a community cohort setting can facilitate dementia screening.
迫切需要识别可靠、负担得起且易于使用的策略来检测痴呆症。数字技术,如个人语音记录,提供了评估认知的有吸引力的方式,但能够自动分析此类数据的方法尚不可用。
我们使用了来自弗雷明汉心脏研究(Framingham Heart Study,FHS)的参与者的 1264 份神经心理学检查的语音记录,这是一项基于社区的纵向观察性研究。这些记录的平均持续时间为 73 分钟,并且至少包含两个说话者(参与者和检查者)。在总共的语音记录中,483 份来自认知正常(NC)的参与者,451 份来自轻度认知障碍(MCI)的参与者,330 份来自痴呆症(DE)的参与者。我们开发了两个深度学习模型(两级长短时记忆(LSTM)网络和卷积神经网络(CNN)),这些模型使用音频记录来分类记录中是否包含仅 NC 或仅 DE 的参与者,并区分与 DE 相关的记录与不具有 DE 的记录(即 NDE(NC+MCI))。基于 5 折交叉验证,LSTM 模型在将 DE 病例与 NC 病例分类时的平均(±标准)接收者操作特征曲线(AUC)为 0.740 ± 0.017,平均平衡准确性为 0.647 ± 0.027,平均加权 F1 分数为 0.596 ± 0.047。CNN 模型在将 DE 病例与 NC 病例分类时的平均 AUC 为 0.805 ± 0.027,平均平衡准确性为 0.743 ± 0.015,平均加权 F1 分数为 0.742 ± 0.033。对于与从 NDE 中分类 DE 参与者相关的任务,LSTM 模型的平均 AUC 为 0.734 ± 0.014,平均平衡准确性为 0.675 ± 0.013,平均加权 F1 分数为 0.671 ± 0.015。CNN 模型在从 NDE 中分类 DE 病例时的平均 AUC 为 0.746 ± 0.021,平均平衡准确性为 0.652 ± 0.020,平均加权 F1 分数为 0.635 ± 0.031。
这项概念验证研究表明,在社区队列环境中招募的个体进行神经心理学测试的音频记录的自动化深度学习驱动处理可以促进痴呆症筛查。