一种组合的模糊和朴素贝叶斯策略可用于为伤害描述分配事件代码。

A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.

机构信息

Center for Injury Epidemiology, Liberty Mutual Research Institute for Safety, 71 Frankland Road, Hopkinton, Massachusetts 01748, USA.

出版信息

Inj Prev. 2011 Dec;17(6):407-14. doi: 10.1136/ip.2010.030593. Epub 2011 Apr 11.

DOI:10.1136/ip.2010.030593

PMID:21482563

Abstract

BACKGROUND

Bayesian methods show promise for classifying injury narratives from large administrative datasets into cause groups. This study examined a combined approach where two Bayesian models (Fuzzy and Naïve) were used to either classify a narrative or select it for manual review.

METHODS

Injury narratives were extracted from claims filed with a worker's compensation insurance provider between January 2002 and December 2004. Narratives were separated into a training set (n=11,000) and prediction set (n=3,000). Expert coders assigned two-digit Bureau of Labor Statistics Occupational Injury and Illness Classification event codes to each narrative. Fuzzy and Naïve Bayesian models were developed using manually classified cases in the training set. Two semi-automatic machine coding strategies were evaluated. The first strategy assigned cases for manual review if the Fuzzy and Naïve models disagreed on the classification. The second strategy selected additional cases for manual review from the Agree dataset using prediction strength to reach a level of 50% computer coding and 50% manual coding.

RESULTS

When agreement alone was used as the filtering strategy, the majority were coded by the computer (n=1,928, 64%) leaving 36% for manual review. The overall combined (human plus computer) sensitivity was 0.90 and positive predictive value (PPV) was >0.90 for 11 of 18 2-digit event categories. Implementing the 2nd strategy improved results with an overall sensitivity of 0.95 and PPV >0.90 for 17 of 18 categories.

CONCLUSIONS

A combined Naïve-Fuzzy Bayesian approach can classify some narratives with high accuracy and identify others most beneficial for manual review, reducing the burden on human coders.

摘要

背景

贝叶斯方法在将大型行政数据集的伤害叙述分类为原因组方面显示出了前景。本研究考察了一种联合方法，即使用两种贝叶斯模型（模糊和朴素）来对叙述进行分类或选择进行手动审查。

方法

从 2002 年 1 月至 2004 年 12 月期间向工人赔偿保险公司提交的索赔中提取伤害叙述。将叙述分为训练集（n=11,000）和预测集（n=3,000）。专家编码员为每个叙述分配了两位数字的美国劳工统计局职业伤害和疾病分类事件代码。使用训练集中手动分类的病例开发了模糊和朴素贝叶斯模型。评估了两种半自动机器编码策略。第一种策略是如果模糊和朴素模型对分类意见不一致，则将病例分配进行手动审查。第二种策略使用预测强度从同意数据集选择更多病例进行手动审查，以达到 50%的计算机编码和 50%的手动编码水平。

结果

当仅使用一致作为过滤策略时，大多数由计算机编码（n=1,928，64%），留下 36%供手动审查。整体（人工加计算机）的综合敏感性为 0.90，对于 18 个 2 位数事件类别中的 11 个，阳性预测值（PPV）>0.90。实施第二种策略可提高结果，整体敏感性为 0.95，对于 18 个类别中的 17 个，PPV>0.90。

结论

朴素-模糊贝叶斯方法的组合可以以较高的准确率对某些叙述进行分类，并识别最有利于手动审查的其他叙述，从而减轻人工编码员的负担。

相似文献

A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.一种组合的模糊和朴素贝叶斯策略可用于为伤害描述分配事件代码。

Inj Prev. 2011 Dec;17(6):407-14. doi: 10.1136/ip.2010.030593. Epub 2011 Apr 11.

Bayesian methods: a useful tool for classifying injury narratives into cause groups.贝叶斯方法：将伤害叙述分类为原因组的有用工具。

Inj Prev. 2009 Aug;15(4):259-65. doi: 10.1136/ip.2008.021337.

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.用于监测的大型行政数据库损伤叙述分类——一种结合机器学习集成和人工审核的实用方法。

Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.

Near-miss narratives from the fire service: a Bayesian analysis.消防部门的险些事故叙述：贝叶斯分析。

Accid Anal Prev. 2014 Jan;62:119-29. doi: 10.1016/j.aap.2013.09.012. Epub 2013 Oct 1.

A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms.一种用于公共卫生监测的实用工具：使用朴素贝叶斯算法对来自大型行政数据库的简短伤害描述进行半自动编码。

Accid Anal Prev. 2015 Nov;84:165-76. doi: 10.1016/j.aap.2015.06.014. Epub 2015 Sep 26.

Bayesian decision support for coding occupational injury data.用于职业伤害数据编码的贝叶斯决策支持

J Safety Res. 2016 Jun;57:71-82. doi: 10.1016/j.jsr.2016.03.001. Epub 2016 Mar 15.

Computerized coding of injury narrative data from the National Health Interview Survey.来自美国国家健康访谈调查的伤害叙述数据的计算机编码。

Accid Anal Prev. 2004 Mar;36(2):165-71. doi: 10.1016/s0001-4575(02)00146-x.

Comparison of methods for auto-coding causation of injury narratives.损伤描述因果关系自动编码方法的比较

Accid Anal Prev. 2016 Mar;88:117-23. doi: 10.1016/j.aap.2015.12.006. Epub 2015 Dec 30.

Development and evaluation of a Naïve Bayesian model for coding causation of workers' compensation claims.开发和评估用于编码工人赔偿索赔因果关系的朴素贝叶斯模型。

J Safety Res. 2012 Dec;43(5-6):327-32. doi: 10.1016/j.jsr.2012.10.012. Epub 2012 Nov 1.

Harnessing information from injury narratives in the 'big data' era: understanding and applying machine learning for injury surveillance.在“大数据”时代利用伤害叙事中的信息：理解并应用机器学习进行伤害监测。

Inj Prev. 2016 Apr;22 Suppl 1(Suppl 1):i34-42. doi: 10.1136/injuryprev-2015-041813. Epub 2016 Jan 4.

引用本文的文献

Comparing human text classification performance and explainability with large language and machine learning models using eye-tracking.使用眼动追踪技术比较大语言和机器学习模型与人类文本分类性能和可解释性。

Sci Rep. 2024 Jun 21;14(1):14295. doi: 10.1038/s41598-024-65080-7.

Workers' compensation claim counts and rates by injury event/exposure among state-insured private employers in Ohio, 2007-2017.2007-2017 年俄亥俄州参保私营雇主按伤害事件/暴露分类的工人赔偿索赔数和费率。

J Safety Res. 2021 Dec;79:148-167. doi: 10.1016/j.jsr.2021.08.015. Epub 2021 Sep 17.

Applying Machine Learning to Workers' Compensation Data to Identify Industry-Specific Ergonomic and Safety Prevention Priorities: Ohio, 2001 to 2011.应用机器学习分析工人赔偿数据，以确定特定行业的人体工程学和安全预防重点：俄亥俄州，2001 年至 2011 年。

J Occup Environ Med. 2018 Jan;60(1):55-73. doi: 10.1097/JOM.0000000000001162.

Comparison of methods for auto-coding causation of injury narratives.损伤描述因果关系自动编码方法的比较

Accid Anal Prev. 2016 Mar;88:117-23. doi: 10.1016/j.aap.2015.12.006. Epub 2015 Dec 30.

Inj Prev. 2016 Apr;22 Suppl 1(Suppl 1):i34-42. doi: 10.1136/injuryprev-2015-041813. Epub 2016 Jan 4.

Injury narrative text classification using factorization model.基于因子分解模型的损伤叙事文本分类

BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S5. doi: 10.1186/1472-6947-15-S1-S5. Epub 2015 May 20.

Development and evaluation of a Naïve Bayesian model for coding causation of workers' compensation claims.开发和评估用于编码工人赔偿索赔因果关系的朴素贝叶斯模型。

J Safety Res. 2012 Dec;43(5-6):327-32. doi: 10.1016/j.jsr.2012.10.012. Epub 2012 Nov 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种组合的模糊和朴素贝叶斯策略可用于为伤害描述分配事件代码。

A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献