Suppr超能文献

使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。

Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.

机构信息

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, United States.

Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States.

出版信息

J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.

Abstract

OBJECTIVES

Alzheimer's disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients' subjective experience. We aim to automate the extraction of specific sleep-related patterns, such as snoring, napping, poor sleep quality, daytime sleepiness, night wakings, other sleep problems, and sleep duration, from clinical notes of AD patients. These sleep patterns are hypothesized to play a role in the incidence of AD, providing insight into the relationship between sleep and AD onset and progression.

MATERIALS AND METHODS

A gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192 000 de-identified clinical notes of 7266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based natural language processing (NLP) algorithm, machine learning models, and large language model (LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset.

RESULTS

The annotated dataset of 482 patients comprised a predominantly White (89.2%), older adult population with an average age of 84.7 years, where females represented 64.1%, and a vast majority were non-Hispanic or Latino (94.6%). Rule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of positive predictive value (PPV), the rule-based NLP algorithm achieved the highest PPV scores for daytime sleepiness (1.00) and sleep duration (1.00), while the machine learning models had the highest PPV for napping (0.95) and bad sleep quality (0.86), and LLAMA2 with finetuning had the highest PPV for night wakings (0.93) and sleep problem (0.89).

DISCUSSION

Although sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches did not achieve good results, which is due to the small size of sleep information in the training data.

CONCLUSION

The results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD but could be extended to general sleep information extraction for other diseases.

摘要

目的

阿尔茨海默病(AD)是美国最常见的痴呆症形式。睡眠是与生活方式相关的因素之一,已被证明对老年人的最佳认知功能至关重要。然而,关于睡眠与 AD 发病率之间的关系的研究还很缺乏。进行此类研究的一个主要瓶颈是,传统获取睡眠信息的方式既耗时、效率低下、不可扩展,而且仅限于患者的主观体验。我们旨在从 AD 患者的临床记录中自动提取特定的睡眠相关模式,例如打鼾、小睡、睡眠质量差、白天嗜睡、夜间醒来、其他睡眠问题和睡眠时间。这些睡眠模式据推测在 AD 的发病中起作用,为睡眠与 AD 发病和进展之间的关系提供了深入了解。

材料和方法

从 adSLEEP 中随机抽取的 570 份临床记录文档的手动注释中创建了一个黄金标准数据集,adSLEEP 是从匹兹堡大学医学中心(UPMC)检索到的 7266 名 AD 患者的 192000 份去识别临床记录的语料库。我们开发了基于规则的自然语言处理(NLP)算法、机器学习模型和基于大型语言模型(LLM)的 NLP 算法,以自动从黄金标准数据集中提取与睡眠相关的概念,包括打鼾、小睡、睡眠问题、睡眠质量差、白天嗜睡、夜间醒来和睡眠时间。

结果

482 名患者的注释数据集由占主导地位的白人(89.2%)、年龄在 84.7 岁左右的老年人群组成,其中女性占 64.1%,绝大多数是非西班牙裔或拉丁裔(94.6%)。基于规则的 NLP 算法在所有与睡眠相关的概念上都实现了最佳的 F1 性能。就阳性预测值(PPV)而言,基于规则的 NLP 算法在白天嗜睡(1.00)和睡眠时间(1.00)方面实现了最高的 PPV 评分,而机器学习模型在小睡(0.95)和睡眠质量差(0.86)方面实现了最高的 PPV,而经过微调的 LLAMA2 则在夜间醒来(0.93)和睡眠问题(0.89)方面实现了最高的 PPV。

讨论

尽管睡眠信息在临床记录中很少记录,但所提出的基于规则的 NLP 算法和基于 LLM 的 NLP 算法仍取得了有希望的结果。相比之下,基于机器学习的方法并没有取得很好的结果,这是由于训练数据中睡眠信息的规模较小。

结论

结果表明,基于规则的 NLP 算法在所有睡眠概念上均始终表现最佳。本研究专注于 AD 患者的临床记录,但也可以扩展到其他疾病的一般睡眠信息提取。

相似文献

2
Pharmacotherapies for sleep disturbances in dementia.痴呆症睡眠障碍的药物治疗
Cochrane Database Syst Rev. 2016 Nov 16;11(11):CD009178. doi: 10.1002/14651858.CD009178.pub3.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验