Broad Allison, Luo Xiao, Tahabi Fattah Muhammad, Abdoo Denise, Zhang Zhan, Adelgais Kathleen
University of Colorado School of Medicine, Aurora, Colorado.
Department of Management Science & Information Systems, Oklahoma State University, Stillwater, Oklahoma.
Prehosp Emerg Care. 2025;29(3):227-237. doi: 10.1080/10903127.2025.2451209. Epub 2025 Jan 17.
Abusive head trauma (AHT) is a leading cause of death in young children. Analyses of patient characteristics presenting to Emergency Medical Services (EMS) are often limited to structured data fields. Artificial Intelligence (AI) and Large Language Models (LLM) may identify rare presentations like AHT through factors not found in structured data. Our goal was to apply AI and LLM to EMS narrative documentation of young children to detect AHT.
This is a retrospective cohort study of EMS transports of children <36 months of age with a diagnosis of head injury from the 2018-2019 ESO Research Data Collaborative. Non-abusive closed head injury (NA-CHI) was distinguished from AHT and child maltreatment (AHT-CAN) through 2 expert reviewers; kappa statistic (k) assessed inter-rater reliability. A Natural Language Processing (NLP) framework using an LLM augmented with expert derived n-grams was developed to identify AHT-CAN. We compared test characteristics (sensitivity, specificity, negative predictive value (NPV)) between this NLP framework to a Generative Pretrained Transformer (GPT) or n-grams only models to detect AHT-CAN. Association of specific word tokens with AHT-CAN was analyzed using Pearson's chi-square. Area Under the Receiver Operator Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC) are also reported.
There were 1082 encounters in our cohort; 1030 (95.2%) NA-CHI and 52 (4.8%) AHT-CAN. Inter-rater agreement was substantial ( = 0.71). The augmented NLP framework had a specificity and sensitivity of 72.4% and 92.3%, respectively with a NPV of 99.5%. In comparison, the GPT model had a sensitivity of 69.2%, specificity of 97.1% and NPV of 98.4% and n-grams alone had a sensitivity of 53.8%, specificity of 62.0%, NPV of 96.4%. AUROC was 0.91 and AUPRC was 0.52. A total of 44 n-grams and bi-grams were positively associated with AHT-CAN including "domestic," "various," "bruise," "cheek," "multiple," "doa," "not respond," "see EMS."
AI and LLMs have high sensitivity and specificity to detect AHT-CAN in EMS free-text narratives. Words associated with physical signs of trauma are strongly associated with AHT-CAN. LLMs augmented with a list of n-grams may help EMS identify signs of trauma that aid in the detection of AHT in young children.
虐待性头部创伤(AHT)是幼儿死亡的主要原因。对紧急医疗服务(EMS)接诊患者特征的分析通常仅限于结构化数据字段。人工智能(AI)和大语言模型(LLM)可能会通过结构化数据中未发现的因素识别出AHT等罕见病例。我们的目标是将AI和LLM应用于幼儿的EMS叙事文档,以检测AHT。
这是一项回顾性队列研究,研究对象为2018 - 2019年ESO研究数据合作项目中年龄小于36个月且诊断为头部受伤的儿童的EMS转运情况。两名专家评审将非虐待性闭合性头部损伤(NA-CHI)与AHT和儿童虐待(AHT-CAN)区分开来;kappa统计量(k)评估评分者间的可靠性。开发了一个使用由专家衍生的n元语法增强的LLM的自然语言处理(NLP)框架,以识别AHT-CAN。我们将此NLP框架与生成式预训练变换器(GPT)或仅使用n元语法的模型进行比较,以检测AHT-CAN的测试特征(敏感性、特异性、阴性预测值(NPV))。使用Pearson卡方分析特定词元与AHT-CAN的关联。还报告了受试者操作特征曲线下面积(AUROC)和精确召回率曲线下面积(AUPRC)。
我们的队列中有1082次接诊;1030例(95.2%)为NA-CHI,52例(4.8%)为AHT-CAN。评分者间一致性较高(κ = 0.71)。增强后的NLP框架的特异性和敏感性分别为72.4%和92.3%,NPV为99.5%。相比之下,GPT模型的敏感性为69.2%,特异性为97.1%,NPV为98.4%,仅n元语法的敏感性为53.8%,特异性为�2.0%,NPV为96.4%。AUROC为0.91,AUPRC为0.52。共有44个n元语法和双元语法与AHT-CAN呈正相关,包括“家庭的”“各种”“瘀伤”“脸颊”“多处”“到达现场已死亡”“无反应”“呼叫急救服务”。
AI和LLMs在检测EMS自由文本叙事中的AHT-CAN方面具有较高的敏感性和特异性。与创伤体征相关的词汇与AHT-CAN密切相关。用n元语法列表增强的LLMs可能有助于EMS识别创伤体征,从而有助于检测幼儿的AHT。