Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.
Harvard Medical School, Boston, Massachusetts, USA.
Epilepsia. 2023 Jun;64(6):1472-1481. doi: 10.1111/epi.17589. Epub 2023 Apr 4.
Unstructured data present in electronic health records (EHR) are a rich source of medical information; however, their abstraction is labor intensive. Automated EHR phenotyping (AEP) can reduce the need for manual chart review. We present an AEP model that is designed to automatically identify patients diagnosed with epilepsy.
The ground truth for model training and evaluation was captured from a combination of structured questionnaires filled out by physicians for a subset of patients and manual chart review using customized software. Modeling features included indicators of the presence of keywords and phrases in unstructured clinical notes, prescriptions for antiseizure medications (ASMs), International Classification of Diseases (ICD) codes for seizures and epilepsy, number of ASMs and epilepsy-related ICD codes, age, and sex. Data were randomly divided into training (70%) and hold-out testing (30%) sets, with distinct patients in each set. We trained regularized logistic regression and an extreme gradient boosting models. Model performance was measured using area under the receiver operating curve (AUROC) and area under the precision-recall curve (AUPRC), with 95% confidence intervals (CI) estimated via bootstrapping.
Our study cohort included 3903 adults drawn from outpatient departments of nine hospitals between February 2015 and June 2022 (mean age = 47 ± 18 years, 57% women, 82% White, 84% non-Hispanic, 70% with epilepsy). The final models included 285 features, including 246 keywords and phrases captured from 8415 encounters. Both models achieved AUROC and AUPRC of 1 (95% CI = .99-1.00) in the hold-out testing set.
A machine learning-based AEP approach accurately identifies patients with epilepsy from notes, ICD codes, and ASMs. This model can enable large-scale epilepsy research using EHR databases.
电子健康记录(EHR)中的非结构化数据是丰富的医疗信息来源;然而,它们的提取是劳动密集型的。自动化 EHR 表型分析(AEP)可以减少手动图表审查的需求。我们提出了一种 AEP 模型,旨在自动识别被诊断为癫痫的患者。
模型训练和评估的真实数据来自于医生为一部分患者填写的结构化问卷以及使用定制软件进行的手动图表审查的结合。建模特征包括无结构临床记录中关键词和短语的存在指标、抗癫痫药物(ASM)的处方、癫痫发作和癫痫的国际疾病分类(ICD)代码、ASM 和癫痫相关 ICD 代码的数量、年龄和性别。数据随机分为训练(70%)和保留测试(30%)集,每个集中都有不同的患者。我们训练了正则化逻辑回归和极端梯度提升模型。使用接收器操作曲线下面积(AUROC)和精度-召回曲线下面积(AUPRC)来衡量模型性能,通过自举法估计 95%置信区间(CI)。
我们的研究队列包括 2015 年 2 月至 2022 年 6 月期间来自 9 家医院门诊部门的 3903 名成年人(平均年龄 47±18 岁,57%为女性,82%为白人,84%为非西班牙裔,70%为癫痫患者)。最终模型包括 285 个特征,包括从 8415 次就诊中捕获的 246 个关键词和短语。两种模型在保留测试集中均达到 AUROC 和 AUPRC 的 1(95%CI =.99-1.00)。
基于机器学习的 AEP 方法可以从记录、ICD 代码和 ASM 中准确识别癫痫患者。这种模型可以使使用 EHR 数据库进行大规模癫痫研究成为可能。