Parker Susan T
Feinberg School of Medicine, Northwestern University, 750 N Lakeshore, Chicago, IL, 60611, United States, 1 2487613116.
JMIR AI. 2025 Jun 19;4:e68212. doi: 10.2196/68212.
BACKGROUND: The recent availability of law enforcement and coroner or medical examiner reports for nearly every violent death in the United States expands the potential for natural language processing (NLP) research into violence. OBJECTIVE: The objective of this work is to assess applications of supervised NLP to unstructured data in the National Violent Death Reporting System to predict circumstances and types of violent death. METHODS: This analysis applied distilBERT, a compact large language model (LLM) with fewer parameters relative to full-scale LLMs, to unstructured narrative data to simulate the impacts of preprocessing, volume, and composition of training data on model performance, evaluated by F1-scores, precision, recall, and the false negative rate. Model performance was evaluated for bias by race, ethnicity, and sex by comparing F1-scores across subgroups. RESULTS: A minimum training set of 1500 cases was necessary to achieve an F1-score of 0.6 and a false negative rate of 0.01-0.05 with a compact LLM. Replacement of domain-specific jargon improved model performance, while oversampling positive class cases to address class imbalance did not substantially improve F1-scores. Between racial and ethnic groups, F1-score disparities ranged from 0.2 to 0.25, and between male and female decedents, differences ranged from 0.12 to 0.2. CONCLUSIONS: Compact LLMs with sufficient training data can be applied to supervised NLP tasks with a class imbalance in the National Violent Death Reporting System. Simulations of supervised text classification across the model-fitting process of preprocessing and training compact LLM-informed NLP applications to unstructured death narrative data.
背景:最近,美国几乎每起暴力死亡事件都有执法部门以及验尸官或法医的报告,这为暴力事件的自然语言处理(NLP)研究拓展了潜力。 目的:本研究的目的是评估监督式NLP在国家暴力死亡报告系统中的非结构化数据上的应用,以预测暴力死亡的情况和类型。 方法:本分析将distilBERT(一种相对于全规模语言模型参数较少的紧凑型大语言模型)应用于非结构化叙述数据,以模拟预处理、训练数据量和构成对模型性能的影响,通过F1分数、精确率、召回率和假阴性率进行评估。通过比较各亚组的F1分数,评估模型在种族、族裔和性别方面的偏差表现。 结果:使用紧凑型大语言模型时,要达到F1分数为0.6且假阴性率为0.01 - 0.05,至少需要1500个案例的训练集。替换特定领域的行话可提高模型性能,而对正类案例进行过采样以解决类别不平衡问题,并未显著提高F1分数。在种族和族裔群体之间,F1分数差异在0.2至0.25之间,在男性和女性死者之间,差异在0.12至0.2之间。 结论:具有足够训练数据的紧凑型大语言模型可应用于国家暴力死亡报告系统中存在类别不平衡的监督式NLP任务。对预处理和训练紧凑型大语言模型驱动的NLP应用于非结构化死亡叙述数据的模型拟合过程进行监督式文本分类模拟。
Clin Orthop Relat Res. 2024-9-1
Cochrane Database Syst Rev. 2022-5-20
Health Care Sci. 2023-7-24
Npj Ment Health Res. 2024-2-14
JMIR Ment Health. 2023-10-17
Health Sci Rep. 2023-9-11