开发一种自然语言处理算法，从电子健康记录中提取癫痫发作类型和频率。

Development of a natural language processing algorithm to extract seizure types and frequencies from the electronic health record.

机构信息

Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States; Department of Neurological Sciences, University of Vermont Medical Center, Burlington, VT, United States.

Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States.

出版信息

Seizure. 2022 Oct;101:48-51. doi: 10.1016/j.seizure.2022.07.010. Epub 2022 Jul 20.

DOI:10.1016/j.seizure.2022.07.010

PMID:35882104

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9547963/

Abstract

OBJECTIVE

To develop a natural language processing (NLP) algorithm to abstract seizure types and frequencies from electronic health records (EHR).

BACKGROUND

Seizure frequency measurement is an epilepsy quality metric. Yet, abstraction of seizure frequency from the EHR is laborious. We present an NLP algorithm to extract seizure data from unstructured text of clinic notes. Algorithm performance was assessed at two epilepsy centers.

METHODS

We developed a rules-based NLP algorithm to recognize terms related to seizures and frequency within the text of an outpatient encounter. Algorithm output (e.g. number of seizures of a particular type within a time interval) was compared to seizure data manually annotated by two expert reviewers ("gold standard"). The algorithm was developed from 150 clinic notes from institution #1 (development set), then tested on a separate set of 219 notes from institution #1 (internal test set) with 248 unique seizure frequency elements. The algorithm was separately applied to 100 notes from institution #2 (external test set) with 124 unique seizure frequency elements. Algorithm performance was measured by recall (sensitivity), precision (positive predictive value), and F1 score (geometric mean of precision and recall).

RESULTS

In the internal test set, the algorithm demonstrated 70% recall (173/248), 95% precision (173/182), and 0.82 F1 score compared to manual review. Algorithm performance in the external test set was lower with 22% recall (27/124), 73% precision (27/37), and 0.40 F1 score.

CONCLUSIONS

These results suggest NLP extraction of seizure types and frequencies is feasible, though not without challenges in generalizability for large-scale implementation.

摘要

目的

开发一种自然语言处理（NLP）算法，从电子健康记录（EHR）中提取发作类型和频率。

背景

发作频率测量是癫痫质量指标。然而，从 EHR 中提取发作频率是很费力的。我们提出了一种 NLP 算法，从诊所记录的非结构化文本中提取发作数据。在两个癫痫中心评估了算法性能。

方法

我们开发了一种基于规则的 NLP 算法，以识别门诊就诊记录文本中与发作和频率相关的术语。算法输出（例如，特定类型的发作在时间间隔内的次数）与由两位专家审阅员手动标记的发作数据（“黄金标准”）进行比较。该算法是从机构 #1 的 150 份诊所记录中开发的（开发集），然后在机构 #1 的另一个 219 份记录（内部测试集）上进行测试，其中有 248 个独特的发作频率元素。该算法分别应用于机构 #2 的 100 份记录（外部测试集），其中有 124 个独特的发作频率元素。通过召回率（敏感性）、精度（阳性预测值）和 F1 分数（精度和召回率的几何平均值）来衡量算法性能。

结果

在内部测试集中，与手动审查相比，该算法的召回率为 70%（173/248），精度为 95%（173/182），F1 得分为 0.82。在外部测试集中，算法性能较低，召回率为 22%（27/124），精度为 73%（27/37），F1 得分为 0.40。

结论

这些结果表明，从自然语言处理中提取发作类型和频率是可行的，但在大规模实施方面仍存在通用性挑战。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

开发一种自然语言处理算法，从电子健康记录中提取癫痫发作类型和频率。

Development of a natural language processing algorithm to extract seizure types and frequencies from the electronic health record.

机构信息

出版信息

OBJECTIVE

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

目的

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

开发一种自然语言处理算法，从电子健康记录中提取癫痫发作类型和频率。

Development of a natural language processing algorithm to extract seizure types and frequencies from the electronic health record.

机构信息

出版信息

OBJECTIVE

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

目的

背景

方法

结果

结论