Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States; Department of Neurological Sciences, University of Vermont Medical Center, Burlington, VT, United States.
Department of Neurology, University of Pennsylvania, Philadelphia, PA, United States.
Seizure. 2022 Oct;101:48-51. doi: 10.1016/j.seizure.2022.07.010. Epub 2022 Jul 20.
To develop a natural language processing (NLP) algorithm to abstract seizure types and frequencies from electronic health records (EHR).
Seizure frequency measurement is an epilepsy quality metric. Yet, abstraction of seizure frequency from the EHR is laborious. We present an NLP algorithm to extract seizure data from unstructured text of clinic notes. Algorithm performance was assessed at two epilepsy centers.
We developed a rules-based NLP algorithm to recognize terms related to seizures and frequency within the text of an outpatient encounter. Algorithm output (e.g. number of seizures of a particular type within a time interval) was compared to seizure data manually annotated by two expert reviewers ("gold standard"). The algorithm was developed from 150 clinic notes from institution #1 (development set), then tested on a separate set of 219 notes from institution #1 (internal test set) with 248 unique seizure frequency elements. The algorithm was separately applied to 100 notes from institution #2 (external test set) with 124 unique seizure frequency elements. Algorithm performance was measured by recall (sensitivity), precision (positive predictive value), and F1 score (geometric mean of precision and recall).
In the internal test set, the algorithm demonstrated 70% recall (173/248), 95% precision (173/182), and 0.82 F1 score compared to manual review. Algorithm performance in the external test set was lower with 22% recall (27/124), 73% precision (27/37), and 0.40 F1 score.
These results suggest NLP extraction of seizure types and frequencies is feasible, though not without challenges in generalizability for large-scale implementation.
开发一种自然语言处理(NLP)算法,从电子健康记录(EHR)中提取发作类型和频率。
发作频率测量是癫痫质量指标。然而,从 EHR 中提取发作频率是很费力的。我们提出了一种 NLP 算法,从诊所记录的非结构化文本中提取发作数据。在两个癫痫中心评估了算法性能。
我们开发了一种基于规则的 NLP 算法,以识别门诊就诊记录文本中与发作和频率相关的术语。算法输出(例如,特定类型的发作在时间间隔内的次数)与由两位专家审阅员手动标记的发作数据(“黄金标准”)进行比较。该算法是从机构 #1 的 150 份诊所记录中开发的(开发集),然后在机构 #1 的另一个 219 份记录(内部测试集)上进行测试,其中有 248 个独特的发作频率元素。该算法分别应用于机构 #2 的 100 份记录(外部测试集),其中有 124 个独特的发作频率元素。通过召回率(敏感性)、精度(阳性预测值)和 F1 分数(精度和召回率的几何平均值)来衡量算法性能。
在内部测试集中,与手动审查相比,该算法的召回率为 70%(173/248),精度为 95%(173/182),F1 得分为 0.82。在外部测试集中,算法性能较低,召回率为 22%(27/124),精度为 73%(27/37),F1 得分为 0.40。
这些结果表明,从自然语言处理中提取发作类型和频率是可行的,但在大规模实施方面仍存在通用性挑战。