College of Medicine, Seoul National University, Seoul, Republic of Korea.
Institute of Health and Environment, Seoul National University School of Public Health, Seoul, Republic of Korea.
J Med Internet Res. 2020 Aug 13;22(8):e19222. doi: 10.2196/19222.
In most industrialized societies, regulations, inspections, insurance, and legal options are established to support workers who suffer injury, disease, or death in relation to their work; in practice, these resources are imperfect or even unavailable due to workplace or employer obstruction. Thus, limitations exist to identify unmet needs in occupational safety and health information.
The aim of this study was to explore hidden issues related to occupational accidents by examining social network services (SNS) data using topic modeling.
Based on the results of a Google search for the phrases occupational accident, industrial accident and occupational diseases, a total of 145 websites were selected. From among these websites, we collected 15,244 documents on queries related to occupational accidents between 2002 and 2018. To transform unstructured text into structure data, natural language processing of the Korean language was conducted. We performed the latent Dirichlet allocation (LDA) as a topic model using a Python library. A time-series linear regression analysis was also conducted to identify yearly trends for the given documents.
The results of the LDA model showed 14 topics with 3 themes: workers' compensation benefits (Theme 1), illicit agreements with the employer (Theme 2), and fatal and non-fatal injuries and vulnerable workers (Theme 3). Theme 1 represented the largest cluster (52.2%) of the collected documents and included keywords related to workers' compensation (ie, company, occupational injury, insurance, accident, approval, and compensation) and keywords describing specific compensation benefits such as medical expense benefits, temporary incapacity benefits, and disability benefits. In the yearly trend, Theme 1 gradually decreased; however, other themes showed an overall increasing pattern. Certain queries (ie, musculoskeletal system, critical care, and foreign workers) showed no significant variation in the number of queries.
We conducted LDA analysis of SNS data of occupational accident-related queries and discovered that the primary concerns of workers posting about occupational injuries and diseases were workers' compensation benefits, fatal and non-fatal injuries, vulnerable workers, and illicit agreements with employers. While traditional systems focus mainly on quantitative monitoring of occupational accidents, qualitative aspects formulated by topic modeling from unstructured SNS queries may be valuable to address inequalities and improve occupational health and safety.
在大多数工业化社会中,为了支持因工作而受伤、患病或死亡的工人,制定了法规、检查、保险和法律选择等措施;但实际上,由于工作场所或雇主的阻挠,这些资源并不完善,甚至无法获得。因此,在确定职业安全和健康信息方面的未满足需求方面存在局限性。
本研究旨在通过使用主题建模来检查社交网络服务 (SNS) 数据,以探讨与职业事故相关的隐藏问题。
基于对“职业事故”、“工业事故”和“职业病”短语的谷歌搜索结果,共选择了 145 个网站。从这些网站中,我们收集了 2002 年至 2018 年与职业事故相关的 15244 篇文档。为了将非结构化文本转换为结构化数据,对韩语进行了自然语言处理。我们使用 Python 库执行潜在狄利克雷分配 (LDA) 作为主题模型。还对给定的文档进行了时间序列线性回归分析,以确定年度趋势。
LDA 模型的结果显示了 14 个主题和 3 个主题:工人补偿福利(主题 1)、与雇主的非法协议(主题 2)和致命和非致命伤害和弱势工人(主题 3)。主题 1 代表了收集文件中最大的集群(52.2%),其中包含与工人补偿(即公司、职业伤害、保险、事故、批准和补偿)相关的关键字,以及描述特定补偿福利(如医疗费用福利、临时丧失工作能力福利和残疾福利)的关键字。在年度趋势中,主题 1 逐渐减少;然而,其他主题呈现出整体上升的模式。某些查询(例如,肌肉骨骼系统、重症监护和外国工人)的查询数量没有明显变化。
我们对与职业事故相关的 SNS 查询进行了 LDA 分析,发现发布职业伤害和疾病的工人主要关注的是工人补偿福利、致命和非致命伤害、弱势工人和与雇主的非法协议。虽然传统系统主要侧重于对职业事故的定量监测,但从非结构化 SNS 查询中通过主题建模制定的定性方面可能对解决不平等问题和改善职业健康和安全具有价值。