Goldberg David M
San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, United States.
J Safety Res. 2022 Feb;80:441-455. doi: 10.1016/j.jsr.2021.12.024. Epub 2021 Dec 29.
Ensuring occupational health and safety is an enormous concern for organizations, as accidents not only harm workers but also result in financial losses. Analysis of accident data has the potential to reveal insights that may improve capabilities to mitigate future accidents. However, because accident data are often transcribed textually, analyzing these narratives proves difficult. This study contributes to a recent stream of literature utilizing machine learning to automatically label accident narratives, converting them into more easily analyzable fields.
First, a large dataset of accident narratives in which workers were injured is collected from the U.S. Occupational Safety and Health Administration (OSHA). Word embeddings-based text mining is implemented; compared to past works, this methodology offers excellent performance. Second, to improve the richness of analyses, each record is assessed across five dimensions. The machine learning models provide classifications of body part(s) injured, the source of the injury, the type of event causing the injury, whether a hospitalization occurred, and whether an amputation occurred. Finally, demonstrating generalizability, the trained models are deployed to analyze two additional datasets of accident narratives in the construction industry and the mining and metals industry (transfer learning). Practical Applications: These contributions improve organizations' capacities to rapidly analyze textual accident narratives.
确保职业健康与安全是各组织极为关注的问题,因为事故不仅会伤害工人,还会导致经济损失。对事故数据进行分析有可能揭示一些见解,从而提高预防未来事故的能力。然而,由于事故数据通常是文本转录形式,分析这些叙述性内容颇具难度。本研究为近期利用机器学习自动标记事故叙述、将其转换为更易于分析的字段的一系列文献做出了贡献。
首先,从美国职业安全与健康管理局(OSHA)收集了一个关于工人受伤的事故叙述大型数据集。实施了基于词嵌入的文本挖掘;与以往的研究相比,这种方法具有出色的性能。其次,为了提高分析的丰富性,对每条记录从五个维度进行评估。机器学习模型提供受伤身体部位、伤害来源、导致伤害的事件类型、是否住院以及是否截肢的分类。最后,为证明通用性,将经过训练的模型部署用于分析建筑行业和采矿与金属行业的另外两个事故叙述数据集(迁移学习)。实际应用:这些成果提高了组织快速分析文本事故叙述的能力。