Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Epilepsy Res. 2024 Nov;207:107451. doi: 10.1016/j.eplepsyres.2024.107451. Epub 2024 Sep 10.
Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP).
We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure ("today", "1-6 days ago", "1-4 weeks ago", "more than 1-3 months ago", "more than 3-6 months ago", "more than 6-12 months ago", "more than 1-2 years ago", "more than 2 years ago") and seizure frequency ("innumerable", "multiple", "daily", "weekly", "monthly", "once per year", "less than once per year"). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95 % confidence intervals (CI) estimated via bootstrapping.
Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57 %), White (81 %) and non-Hispanic (85 %). The models achieved an MAE (95 % CI) for date of last seizure of 4 (4.00-4.86) weeks, and for seizure frequency of 0.02 (0.02-0.02) seizures per day.
Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.
监测癫痫患者的癫痫控制指标是临床护理的关键。从电子健康记录(EHR)中的非结构化文本中手动提取这些指标非常繁琐。我们旨在使用自然语言处理(NLP)从癫痫患者的临床记录中提取最后一次发作日期和发作频率。
我们从波士顿两家医院的癫痫诊所患者的记录中提取了癫痫控制指标。使用经过预训练的 RoBERTa_for_seizureFrequency_QA 模型提取最后一次发作日期和发作频率的指标,并结合正则表达式。我们设计了该算法,以分类最后一次发作的时间(“今天”、“1-6 天前”、“1-4 周前”、“超过 1-3 个月前”、“超过 3-6 个月前”、“超过 6-12 个月前”、“超过 1-2 年前”、“超过 2 年前”)和发作频率(“无数次”、“多次”、“每天”、“每周”、“每月”、“每年一次”、“每年不到一次”)。我们的真实数据包括医生填写的结构化问卷。使用接收者操作特征曲线(AUROC)和精度召回曲线(AUPRC)下的面积来衡量分类标签的性能,使用中位数绝对误差(MAE)来衡量有序标签的性能,95%置信区间(CI)通过自举法估计。
我们的队列包括 1773 名成年癫痫患者,共 5658 次就诊,报告了癫痫控制指标,这些患者于 2018 年 12 月至 2022 年 5 月在癫痫诊所就诊。队列的平均年龄为 42 岁,大多数为女性(57%)、白人(81%)和非西班牙裔(85%)。该模型在最后一次发作日期的 MAE(95%CI)方面达到 4(4.00-4.86)周,在发作频率的 MAE(95%CI)方面达到 0.02(0.02-0.02)次/天。
我们的 NLP 方法表明,从 EHR 中提取癫痫控制指标是可行的,允许进行大规模的 EHR 研究。