National Institute for Occupational Safety and Health, Division of Surveillance, Hazard Evaluations, and Field Studies, Industrywide Studies Branch, 4676 Columbia Parkway, Cincinnati, OH 45226, USA.
J Safety Res. 2012 Dec;43(5-6):327-32. doi: 10.1016/j.jsr.2012.10.012. Epub 2012 Nov 1.
Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers' compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records.
To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims.
The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy.
The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers' compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives.
对于许多研究人员来说,跟踪和趋势化因人体工程学风险因素(如过度用力和重复动作(MSD)以及滑倒、绊倒或跌倒(STF))引起的肌肉骨骼疾病(MSD)和疾病的发生率是非常重要的。不幸的是,在工人赔偿系统等大型数据集,确定伤害和疾病的原因通常需要阅读和对可能上百万条记录的事故文本描述进行编码。
为了减轻手动编码的需求,本文描述并评估了一种计算机自动编码算法,该算法通过从一组先前手动编码的索赔中学习,展示了快速准确地对数百万条索赔进行编码的能力。
自动编码程序能够以大约 90%的准确率对索赔进行肌肉骨骼疾病、STF 或其他分类。
本文中开发和讨论的程序为在大型数据库中基于非结构化文本描述和由此产生的伤害诊断,准确高效地识别工人赔偿索赔的因果关系提供了一种方法。该程序可以在几分钟内对数千个索赔进行编码。本文中描述的方法可以供研究人员和从业者使用,以减轻阅读和识别 STF 或 MSD 索赔因果关系的手动负担。此外,该方法可以轻松推广到对其他非结构化文本描述进行编码/分类。