School of Nursing, Columbia University, New York, New York, USA.
School of Nursing, University of Virginia, Charlottesville, Virginia, USA.
J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173.
Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives.
Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study.
Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics.
NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves.
Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.
电子病历(EHR)中的自然语言处理(NLP)可以促进症状科学的发展。我们旨在综合关于使用 NLP 处理或分析 EHR 自由文本叙述中记录的症状信息的文献。
我们对 PubMed 和 EMBASE 中的 1964 条记录进行了搜索,将其缩小到 27 篇符合条件的文章。从每项研究中提取了与目的、自由文本语料库、患者、症状、NLP 方法、评估指标和质量指标相关的数据。
14 项研究将症状相关信息作为主要结果呈现。EHR 叙述代表了各种住院和门诊临床专业,其中一般、心脏病学和心理健康出现的频率最高。研究涵盖了广泛的症状,包括呼吸急促、疼痛、恶心、头晕、睡眠障碍、便秘和情绪低落。NLP 方法包括先前开发的 NLP 工具、分类方法和手动编制的基于规则的处理。只有三分之一(n=9)的研究报告了患者人口统计学特征。
NLP 用于从由各种医疗保健提供者在广泛的临床专业范围内记录的 EHR 自由文本叙述中提取信息。该领域当前的重点是开发提取症状信息的方法,以及使用症状信息进行疾病分类任务,而不是检查症状本身。
未来的 NLP 研究应集中于调查 EHR 自由文本叙述中的症状和症状记录。应努力检查患者特征,并公开提供与症状相关的 NLP 算法或管道以及词汇表。