Department of Environmental & Occupational Health, Drexel University School of Public Health, 1505 Race Street, MS 1034, Philadelphia, PA 19102, United States.
Accid Anal Prev. 2014 Jan;62:119-29. doi: 10.1016/j.aap.2013.09.012. Epub 2013 Oct 1.
In occupational safety research, narrative text analysis has been combined with coded surveillance, data to improve identification and understanding of injuries and their circumstances. Injury data give, information about incidence and the direct cause of an injury, while near-miss data enable the, identification of various hazards within an organization or industry. Further, near-miss data provide an, opportunity for surveillance and risk reduction. The National Firefighter Near-Miss Reporting System, (NFFNMRS) is a voluntary reporting system that collects narrative text data on near-miss and injurious, events within the fire and emergency services industry. In recent research, autocoding techniques, using Bayesian models have been used to categorize/code injury narratives with up to 90% accuracy, thereby reducing the amount of human effort required to manually code large datasets. Autocoding, techniques have not yet been applied to near-miss narrative data.
We manually assigned mechanism of injury codes to previously un-coded narratives from the, NFFNMRS and used this as a training set to develop two Bayesian autocoding models, Fuzzy and Naïve. We calculated sensitivity, specificity and positive predictive value for both models. We also evaluated, the effect of training set size on prediction sensitivity and compared the models' predictive ability as, related to injury outcome. We cross-validated a subset of the prediction set for accuracy of the model, predictions.
Overall, the Fuzzy model performed better than Naïve, with a sensitivity of 0.74 compared to 0.678., Where Fuzzy and Naïve shared the same prediction, the cross-validation showed a sensitivity of 0.602., As the number of records in the training set increased, the models performed at a higher sensitivity, suggesting that both the Fuzzy and Naïve models were essentially "learning". Injury records were, predicted with greater sensitivity than near-miss records.
We conclude that the application of Bayesian autocoding methods can successfully code both near misses, and injuries in longer-than-average narratives with non-specific prompts regarding injury. Such, coding allowed for the creation of two new quantitative data elements for injury outcome and injury, mechanism.
在职业安全研究中,叙述性文本分析已与编码监测相结合,以提高对伤害及其环境的识别和理解。伤害数据提供了有关伤害发生率和直接原因的信息,而近因数据则使组织或行业内的各种危害得以识别。此外,近因数据为监测和降低风险提供了机会。国家消防员近因报告系统(NFFNMRS)是一个自愿报告系统,它收集消防和紧急服务行业内近因和受伤事件的叙述性文本数据。在最近的研究中,使用贝叶斯模型的自动编码技术已被用于对伤害叙述进行分类/编码,准确率高达 90%,从而减少了手动对大型数据集进行编码所需的工作量。自动编码技术尚未应用于近因叙述数据。
我们手动将伤害机制代码分配给 NFFNMRS 中以前未编码的叙述,并将其用作训练集,以开发两个贝叶斯自动编码模型,模糊和朴素。我们计算了两个模型的敏感性、特异性和阳性预测值。我们还评估了训练集大小对预测敏感性的影响,并比较了模型对伤害结果的预测能力。我们对预测集的一个子集进行了交叉验证,以评估模型预测的准确性。
总体而言,模糊模型的性能优于朴素模型,敏感性为 0.74,而朴素模型的敏感性为 0.678。在模糊模型和朴素模型具有相同预测的情况下,交叉验证的敏感性为 0.602。随着训练集记录数的增加,模型的性能提高了敏感性,表明模糊和朴素模型都在“学习”。伤害记录的预测敏感性高于近因记录。
我们得出结论,贝叶斯自动编码方法的应用可以成功地对具有非特定伤害提示的较长叙述中的近因和伤害进行编码。这种编码方法为伤害结果和伤害机制创建了两个新的定量数据元素。