Division of Child Neurology, Weill Cornell Medicine, New York, New York.
Department of Epidemiology, Columbia University Medical Center, New York, New York.
Epilepsia. 2019 Jun;60(6):1209-1220. doi: 10.1111/epi.15966. Epub 2019 May 21.
Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability.
Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic-clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three "boilerplate" standard text phrases from away notes and repeated performance.
We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data.
Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.
癫痫猝死(SUDEP)是癫痫患者死亡的一个重要原因。然而,在提供者向患者提供多少关于 SUDEP 的咨询方面存在差距。一种潜在的解决方案是通过电子医疗记录(EMR)中风险因素的自动检测,电子提示临床医生提供咨询。我们评估了(1)使用正则表达式识别 EMR 中风险因素的可行性和通用性,以及(2)通用性的障碍。
数据包括来自一个医疗中心(家庭)的 3000 名患者的医生记录和来自五个额外中心(外出)的 1000 名患者的医生记录。通过病历回顾,我们确定了三个 SUDEP 风险因素:(1)全面强直阵挛性发作,(2)难治性癫痫,和(3)癫痫手术候选者。使用家庭训练数据手动创建风险因素的正则表达式,并用家庭测试和外出测试数据评估性能。性能通过敏感性、阳性预测值和 F 度量来评估。将外出与家庭测试数据的性能下降幅度绝对值<0.10 定义为通用性。为了评估通用性的潜在障碍,我们确定了在外出数据中比家庭数据中更常见的错误原因。为了展示如何通过微小的修订来提高通用性,我们从外出记录中删除了三个“模板”标准文本短语,并重复了性能评估。
我们观察到家庭测试数据的高性能(F 度量范围=0.86-0.90),以及外出测试数据的低到高性能(F 度量范围=0.53-0.81)。在删除三个模板短语后,外出性能得到提高(F 度量范围=0.79-0.89),几乎所有指标都达到了通用性。通用性的唯一重大障碍是模板短语的使用,导致外出数据中 171 个错误中的 104 个(61%)。
正则表达式是一种可行且可能通用的方法,可以识别与 SUDEP 风险相关的变量。我们的方法可以用于创建大型患者队列进行研究,并生成 SUDEP 咨询的电子提示。