用词嵌入法刻画事故叙述：提高准确性、丰富性和通用性。

Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability.

作者信息

Goldberg David M

机构信息

San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, United States.

出版信息

J Safety Res. 2022 Feb;80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.

DOI:10.1016/j.jsr.2021.12.024

PMID:35249625

Abstract

INTRODUCTION

Ensuring occupational health and safety is an enormous concern for organizations, as accidents not only harm workers but also result in financial losses. Analysis of accident data has the potential to reveal insights that may improve capabilities to mitigate future accidents. However, because accident data are often transcribed textually, analyzing these narratives proves difficult. This study contributes to a recent stream of literature utilizing machine learning to automatically label accident narratives, converting them into more easily analyzable fields.

METHOD

First, a large dataset of accident narratives in which workers were injured is collected from the U.S. Occupational Safety and Health Administration (OSHA). Word embeddings-based text mining is implemented; compared to past works, this methodology offers excellent performance. Second, to improve the richness of analyses, each record is assessed across five dimensions. The machine learning models provide classifications of body part(s) injured, the source of the injury, the type of event causing the injury, whether a hospitalization occurred, and whether an amputation occurred. Finally, demonstrating generalizability, the trained models are deployed to analyze two additional datasets of accident narratives in the construction industry and the mining and metals industry (transfer learning). Practical Applications: These contributions improve organizations' capacities to rapidly analyze textual accident narratives.

摘要

引言

确保职业健康与安全是各组织极为关注的问题，因为事故不仅会伤害工人，还会导致经济损失。对事故数据进行分析有可能揭示一些见解，从而提高预防未来事故的能力。然而，由于事故数据通常是文本转录形式，分析这些叙述性内容颇具难度。本研究为近期利用机器学习自动标记事故叙述、将其转换为更易于分析的字段的一系列文献做出了贡献。

方法

首先，从美国职业安全与健康管理局（OSHA）收集了一个关于工人受伤的事故叙述大型数据集。实施了基于词嵌入的文本挖掘；与以往的研究相比，这种方法具有出色的性能。其次，为了提高分析的丰富性，对每条记录从五个维度进行评估。机器学习模型提供受伤身体部位、伤害来源、导致伤害的事件类型、是否住院以及是否截肢的分类。最后，为证明通用性，将经过训练的模型部署用于分析建筑行业和采矿与金属行业的另外两个事故叙述数据集（迁移学习）。实际应用：这些成果提高了组织快速分析文本事故叙述的能力。

相似文献

Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability.

J Safety Res. 2022 Feb;80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.

Predictive Modeling for Occupational Safety Outcomes and Days Away from Work Analysis in Mining Operations.

Int J Environ Res Public Health. 2020 Sep 27;17(19):7054. doi: 10.3390/ijerph17197054.

Construction accident narrative classification: An evaluation of text mining techniques.

Accid Anal Prev. 2017 Nov;108:122-130. doi: 10.1016/j.aap.2017.08.026. Epub 2017 Sep 1.

Enhancing safety of construction workers in Korea: an integrated text mining and machine learning framework for predicting accident types.

Int J Inj Contr Saf Promot. 2024 Jun;31(2):203-215. doi: 10.1080/17457300.2023.2300424. Epub 2024 Jan 2.

Contextualizing injury severity from occupational accident reports using an optimized deep learning prediction model.

PeerJ Comput Sci. 2024 Apr 17;10:e1985. doi: 10.7717/peerj-cs.1985. eCollection 2024.

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.

A model based on PDCA and data mining approach for the prevention of occupational accidents in the plumbing activity in the construction sector.

Work. 2024;78(2):399-410. doi: 10.3233/WOR-230112.

From unstructured accident reports to a hybrid decision support system for occupational risk management: The consensus converging approach.

J Safety Res. 2024 Jun;89:91-104. doi: 10.1016/j.jsr.2024.02.006. Epub 2024 Mar 1.

A study of the shift in fatal construction work-related accidents during 2012-2019 in Turkey.

Int J Occup Saf Ergon. 2022 Sep;28(3):1522-1532. doi: 10.1080/10803548.2021.1900503. Epub 2021 Apr 5.

Identifying low-quality patterns in accident reports from textual data.

Int J Occup Saf Ergon. 2023 Sep;29(3):1088-1100. doi: 10.1080/10803548.2022.2111847. Epub 2022 Sep 13.

引用本文的文献

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management.

Int J Environ Res Public Health. 2024 Jun 26;21(7):831. doi: 10.3390/ijerph21070831.

Contextualizing injury severity from occupational accident reports using an optimized deep learning prediction model.

PeerJ Comput Sci. 2024 Apr 17;10:e1985. doi: 10.7717/peerj-cs.1985. eCollection 2024.

Occupational Injury Risk Mitigation: Machine Learning Approach and Feature Optimization for Smart Workplace Surveillance.

Int J Environ Res Public Health. 2022 Oct 27;19(21):13962. doi: 10.3390/ijerph192113962.

Application of a Machine Learning-Based Decision Support Tool to Improve an Injury Surveillance System Workflow.

Appl Clin Inform. 2022 May;13(3):700-710. doi: 10.1055/a-1863-7176. Epub 2022 May 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用词嵌入法刻画事故叙述：提高准确性、丰富性和通用性。

Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability.

作者信息

机构信息

出版信息

INTRODUCTION

METHOD

引言

方法

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献