Maclagan Laura C, Abdalla Mohamed, Harris Daniel A, Stukel Therese A, Chen Branson, Candido Elisa, Swartz Richard H, Iaboni Andrea, Jaakkimainen R Liisa, Bronskill Susan E
ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada.
Department of Computer Science, University of Toronto, Toronto, Canada.
J Healthc Inform Res. 2023 Jan 23;7(1):42-58. doi: 10.1007/s41666-023-00125-6. eCollection 2023 Mar.
Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia.
The online version contains supplementary material available at 10.1007/s41666-023-00125-6.
在初级保健实践和研究中,痴呆症和轻度认知障碍可能未得到充分认识。电子病历(EMR)中的自由文本字段是丰富的信息来源,可能有助于提高检测率,并能更好地了解痴呆症高危人群。我们使用自然语言处理(NLP)来识别电子病历中与痴呆症相关的特征,并比较监督机器学习模型对痴呆症患者进行分类的性能。我们从截至2016年12月收集的电子病历记录中,选取了加拿大安大略省66岁及以上的初级保健患者队列:526例患有痴呆症,44148例未患痴呆症。我们通过应用已发表的列表、临床医生的意见以及带有词嵌入的NLP技术,对自由文本的病程记录和会诊记录进行分析,识别出与痴呆症相关的特征,并将这些特征组织成主题组。使用机器学习模型,我们比较了这些特征在检测痴呆症方面的性能,包括总体性能以及相对于健康管理数据库中痴呆症病例确诊时间的各个时间段的性能。我们识别出了900多个与痴呆症相关的特征,并将其分为八个主题(包括症状、社交、功能、认知等)。使用所有时间段的记录时,套索回归(LASSO)表现最佳(F1分数:77.2%,灵敏度:71.5%,特异性:99.8%)。当纳入病例确诊前书写的记录时,模型性能较差(F1分数:14.4%,灵敏度:8.3%,特异性:99.9%),但随着后期记录的增加,性能有所改善。虽然类似的模型最终可能会提高对初级保健电子病历中认知问题和痴呆症的识别能力,但我们的研究结果表明仍需进一步研究,以确定哪些额外的电子病历组件可能有助于促进痴呆症的早期检测。
在线版本包含可在10.1007/s41666-023-00125-6获取的补充材料。