Aronson Paul L, Kuppermann Nathan, Mahajan Prashant, Nielsen Blake, Olsen Cody S, Meeks Huong D, Grundmeier Robert W
Section of Pediatric Emergency Medicine, Departments of Pediatrics and of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut.
Departments of Pediatrics and of Emergency Medicine, University of California Davis School of Medicine, Sacramento, California.
Hosp Pediatr. 2025 Jan 1;15(1):e1-e5. doi: 10.1542/hpeds.2024-008051.
Natural language processing (NLP) can enhance research studies for febrile infants by more comprehensive cohort identification. We aimed to refine and validate an NLP algorithm to identify and extract quantified temperature measurements from infants aged 90 days and younger with fevers at home or clinics prior to emergency department (ED) visits.
We conducted a cross-sectional study using electronic health record (EHR) data from 17 EDs in 10 health systems that are part of the Pediatric Emergency Care Applied Research Network Registry. All visits between January 1, 2012, and May 31, 2023, for infants aged 90 days and younger were eligible, excluding those with trauma-related diagnoses. We iteratively refined a prespecified rules-based NLP algorithm in 7 successive samples of 200 visits and validated the algorithm on a held-out sample of 500 visits. The reference standard for pre-ED quantified temperature measurements was a temperature documented in clinical notes, excluding ED vital sign temperatures.
In our final sample, 113 of 500 visits (23%) had quantified temperature measurements. The NLP algorithm had sensitivity 95% (95% CI: 88%-98%), specificity 96% (95% CI: 93%-97%), and positive predictive value 86% (95% CI: 78%-91%). When applying rules to exclude temperatures that may have been noted more than 24 hours previously, the NLP algorithm had lower sensitivity (88%; 95% CI: 81%-93%) but similar specificity (97%; 95% CI: 95%-98%).
This highly accurate NLP algorithm can identify febrile infants without documented fevers in the ED to facilitate their inclusion in large studies using EHR data.
自然语言处理(NLP)可通过更全面的队列识别来加强对发热婴儿的研究。我们旨在优化并验证一种NLP算法,以识别和提取90日龄及以下婴儿在急诊就诊前在家中或诊所发热时的量化体温测量值。
我们使用了来自10个医疗系统中17个急诊科的电子健康记录(EHR)数据进行了一项横断面研究,这些医疗系统是儿科急诊护理应用研究网络登记处的一部分。2012年1月1日至2023年5月31日期间,所有90日龄及以下婴儿的就诊均符合条件,排除那些有创伤相关诊断的病例。我们在连续7个样本中,每个样本200次就诊,对预先指定的基于规则的NLP算法进行了迭代优化,并在500次就诊的预留样本上验证了该算法。急诊前量化体温测量的参考标准是临床记录中记录的体温,不包括急诊生命体征体温。
在我们的最终样本中,500次就诊中有113次(23%)有量化体温测量值。NLP算法的灵敏度为95%(95%置信区间:88%-98%),特异性为96%(95%置信区间:93%-97%),阳性预测值为86%(95%置信区间:78%-91%)。当应用规则排除可能在24小时以上记录的体温时,NLP算法的灵敏度较低(88%;95%置信区间:81%-93%),但特异性相似(97%;95%置信区间:95%-98%)。
这种高度准确的NLP算法可以识别在急诊中没有记录发热的发热婴儿,以便将他们纳入使用EHR数据的大型研究中。