用于职业伤害数据编码的贝叶斯决策支持

Bayesian decision support for coding occupational injury data.

作者信息

Nanda Gaurav, Grattan Kathleen M, Chu MyDzung T, Davis Letitia K, Lehto Mark R

机构信息

School of Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907-2023, USA.

Massachusetts Department of Public Health, 250 Washington Street, 4th Floor, Boston, MA 02108, USA.

出版信息

J Safety Res. 2016 Jun;57:71-82. doi: 10.1016/j.jsr.2016.03.001. Epub 2016 Mar 15.

DOI:10.1016/j.jsr.2016.03.001

PMID:27178082

Abstract

INTRODUCTION

Studies on autocoding injury data have found that machine learning algorithms perform well for categories that occur frequently but often struggle with rare categories. Therefore, manual coding, although resource-intensive, cannot be eliminated. We propose a Bayesian decision support system to autocode a large portion of the data, filter cases for manual review, and assist human coders by presenting them top k prediction choices and a confusion matrix of predictions from Bayesian models.

METHOD

We studied the prediction performance of Single-Word (SW) and Two-Word-Sequence (TW) Naïve Bayes models on a sample of data from the 2011 Survey of Occupational Injury and Illness (SOII). We used the agreement in prediction results of SW and TW models, and various prediction strength thresholds for autocoding and filtering cases for manual review. We also studied the sensitivity of the top k predictions of the SW model, TW model, and SW-TW combination, and then compared the accuracy of the manually assigned codes to SOII data with that of the proposed system.

RESULTS

The accuracy of the proposed system, assuming well-trained coders reviewing a subset of only 26% of cases flagged for review, was estimated to be comparable (86.5%) to the accuracy of the original coding of the data set (range: 73%-86.8%). Overall, the TW model had higher sensitivity than the SW model, and the accuracy of the prediction results increased when the two models agreed, and for higher prediction strength thresholds. The sensitivity of the top five predictions was 93%.

CONCLUSIONS

The proposed system seems promising for coding injury data as it offers comparable accuracy and less manual coding.

PRACTICAL APPLICATIONS

Accurate and timely coded occupational injury data is useful for surveillance as well as prevention activities that aim to make workplaces safer.

摘要

引言

对自动编码损伤数据的研究发现，机器学习算法在处理频繁出现的类别时表现良好，但在处理罕见类别时往往存在困难。因此，尽管手动编码资源密集，但无法被淘汰。我们提出了一种贝叶斯决策支持系统，用于对大部分数据进行自动编码，筛选出需要人工审核的案例，并通过向人工编码人员展示前k个预测选择和贝叶斯模型的预测混淆矩阵来协助他们。

方法

我们在2011年职业伤害和疾病调查（SOII）的一部分数据样本上研究了单字（SW）和双字序列（TW）朴素贝叶斯模型的预测性能。我们利用SW和TW模型预测结果的一致性，以及各种预测强度阈值来进行自动编码和筛选需要人工审核的案例。我们还研究了SW模型、TW模型和SW-TW组合的前k个预测的敏感性，然后将人工分配给SOII数据的编码准确性与所提出系统的准确性进行比较。

结果

假设训练有素的编码人员只审核标记为审核的26%的案例子集，所提出系统的准确性估计与数据集原始编码的准确性相当（86.5%）（范围：73%-86.8%）。总体而言，TW模型比SW模型具有更高的敏感性，当两个模型达成一致时，预测结果的准确性会提高，并且对于更高的预测强度阈值也是如此。前五个预测的敏感性为93%。

结论

所提出的系统在编码损伤数据方面似乎很有前景，因为它提供了相当的准确性且减少了人工编码。

实际应用

准确及时编码的职业伤害数据对于监测以及旨在使工作场所更安全的预防活动非常有用。

相似文献

Bayesian decision support for coding occupational injury data.

J Safety Res. 2016 Jun;57:71-82. doi: 10.1016/j.jsr.2016.03.001. Epub 2016 Mar 15.

A combined Fuzzy and Naive Bayesian strategy can be used to assign event codes to injury narratives.

Inj Prev. 2011 Dec;17(6):407-14. doi: 10.1136/ip.2010.030593. Epub 2011 Apr 11.

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?

Accid Anal Prev. 2018 Jan;110:115-127. doi: 10.1016/j.aap.2017.10.020. Epub 2017 Nov 8.

Near-miss narratives from the fire service: a Bayesian analysis.

Accid Anal Prev. 2014 Jan;62:119-29. doi: 10.1016/j.aap.2013.09.012. Epub 2013 Oct 1.

A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms.

Accid Anal Prev. 2015 Nov;84:165-76. doi: 10.1016/j.aap.2015.06.014. Epub 2015 Sep 26.

Comparison of methods for auto-coding causation of injury narratives.

Accid Anal Prev. 2016 Mar;88:117-23. doi: 10.1016/j.aap.2015.12.006. Epub 2015 Dec 30.

Harnessing information from injury narratives in the 'big data' era: understanding and applying machine learning for injury surveillance.

Inj Prev. 2016 Apr;22 Suppl 1(Suppl 1):i34-42. doi: 10.1136/injuryprev-2015-041813. Epub 2016 Jan 4.

JEMs and incompatible occupational coding systems: effect of manual and automatic recoding of job codes on exposure assignment.

Ann Occup Hyg. 2013 Jan;57(1):107-14. doi: 10.1093/annhyg/mes046. Epub 2012 Jul 17.

Bayesian methods: a useful tool for classifying injury narratives into cause groups.

Inj Prev. 2009 Aug;15(4):259-65. doi: 10.1136/ip.2008.021337.

引用本文的文献

Occupational Injury Risk Mitigation: Machine Learning Approach and Feature Optimization for Smart Workplace Surveillance.

Int J Environ Res Public Health. 2022 Oct 27;19(21):13962. doi: 10.3390/ijerph192113962.

Predicting occupational injury causal factors using text-based analytics: A systematic review.

Front Public Health. 2022 Sep 15;10:984099. doi: 10.3389/fpubh.2022.984099. eCollection 2022.

Application of a Machine Learning-Based Decision Support Tool to Improve an Injury Surveillance System Workflow.

Appl Clin Inform. 2022 May;13(3):700-710. doi: 10.1055/a-1863-7176. Epub 2022 May 29.

Applying Machine Learning to Workers' Compensation Data to Identify Industry-Specific Ergonomic and Safety Prevention Priorities: Ohio, 2001 to 2011.

J Occup Environ Med. 2018 Jan;60(1):55-73. doi: 10.1097/JOM.0000000000001162.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于职业伤害数据编码的贝叶斯决策支持

Bayesian decision support for coding occupational injury data.

作者信息

Nanda Gaurav, Grattan Kathleen M, Chu MyDzung T, Davis Letitia K, Lehto Mark R

机构信息

School of Industrial Engineering, Purdue University, 315 N. Grant Street, West Lafayette, IN 47907-2023, USA.

Massachusetts Department of Public Health, 250 Washington Street, 4th Floor, Boston, MA 02108, USA.

出版信息

J Safety Res. 2016 Jun;57:71-82. doi: 10.1016/j.jsr.2016.03.001. Epub 2016 Mar 15.

DOI:10.1016/j.jsr.2016.03.001

PMID:27178082

Abstract

INTRODUCTION

METHOD

RESULTS

CONCLUSIONS

The proposed system seems promising for coding injury data as it offers comparable accuracy and less manual coding.

PRACTICAL APPLICATIONS

Accurate and timely coded occupational injury data is useful for surveillance as well as prevention activities that aim to make workplaces safer.

摘要

引言

方法

结果

结论

所提出的系统在编码损伤数据方面似乎很有前景，因为它提供了相当的准确性且减少了人工编码。

实际应用

准确及时编码的职业伤害数据对于监测以及旨在使工作场所更安全的预防活动非常有用。

用于职业伤害数据编码的贝叶斯决策支持

Bayesian decision support for coding occupational injury data.

作者信息

机构信息

出版信息

INTRODUCTION

METHOD

RESULTS

CONCLUSIONS

PRACTICAL APPLICATIONS

引言

方法

结果

结论

实际应用

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于职业伤害数据编码的贝叶斯决策支持

Bayesian decision support for coding occupational injury data.

作者信息

机构信息

出版信息

INTRODUCTION

METHOD

RESULTS

CONCLUSIONS

PRACTICAL APPLICATIONS

引言

方法

结果

结论

实际应用

相似文献

引用本文的文献