Suppr超能文献

用词嵌入法刻画事故叙述:提高准确性、丰富性和通用性。

Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability.

作者信息

Goldberg David M

机构信息

San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, United States.

出版信息

J Safety Res. 2022 Feb;80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.

Abstract

INTRODUCTION

Ensuring occupational health and safety is an enormous concern for organizations, as accidents not only harm workers but also result in financial losses. Analysis of accident data has the potential to reveal insights that may improve capabilities to mitigate future accidents. However, because accident data are often transcribed textually, analyzing these narratives proves difficult. This study contributes to a recent stream of literature utilizing machine learning to automatically label accident narratives, converting them into more easily analyzable fields.

METHOD

First, a large dataset of accident narratives in which workers were injured is collected from the U.S. Occupational Safety and Health Administration (OSHA). Word embeddings-based text mining is implemented; compared to past works, this methodology offers excellent performance. Second, to improve the richness of analyses, each record is assessed across five dimensions. The machine learning models provide classifications of body part(s) injured, the source of the injury, the type of event causing the injury, whether a hospitalization occurred, and whether an amputation occurred. Finally, demonstrating generalizability, the trained models are deployed to analyze two additional datasets of accident narratives in the construction industry and the mining and metals industry (transfer learning). Practical Applications: These contributions improve organizations' capacities to rapidly analyze textual accident narratives.

摘要

引言

确保职业健康与安全是各组织极为关注的问题,因为事故不仅会伤害工人,还会导致经济损失。对事故数据进行分析有可能揭示一些见解,从而提高预防未来事故的能力。然而,由于事故数据通常是文本转录形式,分析这些叙述性内容颇具难度。本研究为近期利用机器学习自动标记事故叙述、将其转换为更易于分析的字段的一系列文献做出了贡献。

方法

首先,从美国职业安全与健康管理局(OSHA)收集了一个关于工人受伤的事故叙述大型数据集。实施了基于词嵌入的文本挖掘;与以往的研究相比,这种方法具有出色的性能。其次,为了提高分析的丰富性,对每条记录从五个维度进行评估。机器学习模型提供受伤身体部位、伤害来源、导致伤害的事件类型、是否住院以及是否截肢的分类。最后,为证明通用性,将经过训练的模型部署用于分析建筑行业和采矿与金属行业的另外两个事故叙述数据集(迁移学习)。实际应用:这些成果提高了组织快速分析文本事故叙述的能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验