Department of Pediatrics, University of Rochester Medical Center, Rochester, New York, USA.
Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York, USA.
J Hosp Med. 2022 Jan;17(1):11-18. doi: 10.1002/jhm.2732. Epub 2022 Jan 4.
Diagnostic codes can retrospectively identify samples of febrile infants, but sensitivity is low, resulting in many febrile infants eluding detection. To ensure study samples are representative, an improved approach is needed.
To derive and internally validate a natural language processing algorithm to identify febrile infants and compare its performance to diagnostic codes.
This cross-sectional study consisted of infants aged 0-90 days brought to one pediatric emergency department from January 2016 to December 2017. We aimed to identify infants with fever, defined as a documented temperature ≥38°C. We used 2017 clinical notes to develop two rule-based algorithms to identify infants with fever and tested them on data from 2016. Using manual abstraction as the gold standard, we compared performance of the two rule-based algorithms (Models 1, 2) to four previously published diagnostic code groups (Models 5-8) using area under the receiver-operating characteristics curve (AUC), sensitivity, and specificity.
For the test set (n = 1190 infants), 184 infants were febrile (15.5%). The AUCs (0.92-0.95) and sensitivities (86%-92%) of Models 1 and 2 were significantly greater than Models 5-8 (0.67-0.74; 20%-74%) with similar specificities (93%-99%). In contrast to Models 5-8, samples from Models 1 and 2 demonstrated similar characteristics to the gold standard, including fever prevalence, median age, and rates of bacterial infections, hospitalizations, and severe outcomes.
Findings suggest rule-based algorithms can accurately identify febrile infants with greater sensitivity while preserving specificity compared to diagnostic codes. If externally validated, rule-based algorithms may be important tools to create representative study samples, thereby improving generalizability of findings.
诊断代码可以回溯性地识别发热婴儿的样本,但敏感性较低,导致许多发热婴儿未被发现。为了确保研究样本具有代表性,需要一种改进的方法。
开发一种自然语言处理算法来识别发热婴儿,并将其性能与诊断代码进行比较。
这项横断面研究包括 2016 年 1 月至 2017 年 12 月期间因发热到一家儿科急诊就诊的 0-90 日龄婴儿。我们的目标是识别发热婴儿,定义为记录的体温≥38°C。我们使用 2017 年的临床记录来开发两种基于规则的算法来识别发热婴儿,并在 2016 年的数据上对其进行测试。使用手动提取作为金标准,我们比较了两种基于规则的算法(模型 1、2)与之前发表的四个诊断代码组(模型 5-8)的性能,使用受试者工作特征曲线下面积(AUC)、敏感性和特异性。
对于测试集(n=1190 例婴儿),184 例婴儿发热(15.5%)。模型 1 和 2 的 AUC(0.92-0.95)和敏感性(86%-92%)明显高于模型 5-8(0.67-0.74;20%-74%),特异性相似(93%-99%)。与模型 5-8 相比,模型 1 和 2 的样本与金标准具有相似的特征,包括发热发生率、中位年龄以及细菌感染、住院和严重结局的发生率。
研究结果表明,基于规则的算法可以在保持特异性的同时,更准确地识别发热婴儿,具有更高的敏感性。如果经过外部验证,基于规则的算法可能是创建具有代表性的研究样本的重要工具,从而提高研究结果的普遍性。