Suppr超能文献

贝叶斯方法:将伤害叙述分类为原因组的有用工具。

Bayesian methods: a useful tool for classifying injury narratives into cause groups.

机构信息

School of Industrial Engineering, Purdue University, 1287 Grissom Hall, West Lafayette, IN 47907, USA.

出版信息

Inj Prev. 2009 Aug;15(4):259-65. doi: 10.1136/ip.2008.021337.

Abstract

To compare two Bayesian methods (Fuzzy and Naïve) for classifying injury narratives in large administrative databases into event cause groups, a dataset of 14 000 narratives was randomly extracted from claims filed with a worker's compensation insurance provider. Two expert coders assigned one-digit and two-digit Bureau of Labor Statistics (BLS) Occupational Injury and Illness Classification event codes to each narrative. The narratives were separated into a training set of 11 000 cases and a prediction set of 3000 cases. The training set was used to develop two Bayesian classifiers that assigned BLS codes to narratives. Each model was then evaluated for the prediction set. Both models performed well and tended to predict one-digit BLS codes more accurately than two-digit codes. The overall sensitivity of the Fuzzy method was, respectively, 78% and 64% for one-digit and two-digit codes, specificity was 93% and 95%, and positive predictive value (PPV) was 78% and 65%. The Naïve method showed similar accuracy: a sensitivity of 80% and 70%, specificity of 96% and 97%, and PPV of 80% and 70%. For large administrative databases, Bayesian methods show significant promise as a means of classifying injury narratives into cause groups. Overall, Naïve Bayes provided slightly more accurate predictions than Fuzzy Bayes.

摘要

为了比较两种贝叶斯方法(模糊和朴素)在将大型行政数据库中的伤害叙述分类为事件原因组中的应用,从一家工人赔偿保险公司提交的索赔中随机提取了 14000 个叙述的数据集。两位专家编码员为每个叙述分配了一个位数和两位数的 BLS(劳工统计局)职业伤害和疾病分类事件代码。叙述被分为 11000 个案例的训练集和 3000 个案例的预测集。使用训练集开发了两种将 BLS 代码分配给叙述的贝叶斯分类器。然后,对每种模型在预测集中进行了评估。两种模型的性能都很好,并且倾向于更准确地预测一位数 BLS 代码而不是两位数代码。模糊方法的总体敏感性分别为一位数和两位数代码的 78%和 64%,特异性分别为 93%和 95%,阳性预测值(PPV)分别为 78%和 65%。朴素方法表现出类似的准确性:一位数和两位数代码的敏感性分别为 80%和 70%,特异性分别为 96%和 97%,PPV 分别为 80%和 70%。对于大型行政数据库,贝叶斯方法作为将伤害叙述分类为原因组的一种手段具有很大的应用潜力。总体而言,朴素贝叶斯的预测结果比模糊贝叶斯略为准确。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验