Department of Computer Science, Iowa State University, Ames, IA 50014, USA.
Department of Technology, College of Engineering, San Jose State University, San Jose, CA 95192, USA.
Int J Environ Res Public Health. 2020 Sep 27;17(19):7054. doi: 10.3390/ijerph17197054.
Mining is known to be one of the most hazardous occupations in the world. Many serious accidents have occurred worldwide over the years in mining. Although there have been efforts to create a safer work environment for miners, the number of accidents occurring at the mining sites is still significant. Machine learning techniques and predictive analytics are becoming one of the leading resources to create safer work environments in the manufacturing and construction industries. These techniques are leveraged to generate actionable insights to improve decision-making. A large amount of mining safety-related data are available, and machine learning algorithms can be used to analyze the data. The use of machine learning techniques can significantly benefit the mining industry. Decision tree, random forest, and artificial neural networks were implemented to analyze the outcomes of mining accidents. These machine learning models were also used to predict days away from work. An accidents dataset provided by the Mine Safety and Health Administration was used to train the models. The models were trained separately on tabular data and narratives. The use of a synthetic data augmentation technique using word embedding was also investigated to tackle the data imbalance problem. Performance of all the models was compared with the performance of the traditional logistic regression model. The results show that models trained on narratives performed better than the models trained on structured/tabular data in predicting the outcome of the accident. The higher predictive power of the models trained on narratives led to the conclusion that the narratives have additional information relevant to the outcome of injury compared to the tabular entries. The models trained on tabular data had a lower mean squared error compared to the models trained on narratives while predicting the days away from work. The results highlight the importance of predictors, like shift start time, accident time, and mining experience in predicting the days away from work. It was found that the F1 score of all the underrepresented classes except one improved after the use of the data augmentation technique. This approach gave greater insight into the factors influencing the outcome of the accident and days away from work.
采矿是世界上最危险的职业之一。多年来,世界各地都发生了许多严重的采矿事故。尽管已经努力为矿工创造更安全的工作环境,但采矿现场发生的事故数量仍然很多。机器学习技术和预测分析正成为制造业和建筑业创造更安全工作环境的主要资源之一。这些技术被用来生成可操作的见解,以改善决策。大量与采矿安全相关的数据可用,并且可以使用机器学习算法来分析这些数据。使用机器学习技术可以为采矿业带来显著的效益。决策树、随机森林和人工神经网络被用于分析采矿事故的结果。这些机器学习模型也被用于预测旷工天数。使用矿山安全与健康管理局提供的事故数据集来训练模型。模型分别在表格数据和叙述文本上进行训练。还研究了使用基于单词嵌入的合成数据扩充技术来解决数据不平衡问题。比较了所有模型的性能与传统逻辑回归模型的性能。结果表明,在预测事故结果方面,基于叙述文本训练的模型比基于结构化/表格数据训练的模型表现更好。基于叙述文本训练的模型具有更高的预测能力,这表明与表格条目相比,叙述文本中包含与伤害结果相关的其他信息。在预测旷工天数方面,基于表格数据训练的模型的均方误差(Mean Squared Error,MSE)低于基于叙述文本训练的模型。研究结果强调了在预测旷工天数时,像班次开始时间、事故时间和采矿经验等预测因子的重要性。还发现,在使用数据扩充技术后,除一个类别外,所有代表性不足的类别的 F1 分数都有所提高。这种方法深入了解了影响事故结果和旷工天数的因素。