National Centre for Health Information Research and Training, Queensland University of Technology, Victoria Park Road, Kelvin Grove, Queensland, 4059, Australia.
BMC Med Inform Decis Mak. 2010 Apr 7;10:19. doi: 10.1186/1472-6947-10-19.
Work-related injuries in Australia are estimated to cost around $57.5 billion annually, however there are currently insufficient surveillance data available to support an evidence-based public health response. Emergency departments (ED) in Australia are a potential source of information on work-related injuries though most ED's do not have an 'Activity Code' to identify work-related cases with information about the presenting problem recorded in a short free text field. This study compared methods for interrogating text fields for identifying work-related injuries presenting at emergency departments to inform approaches to surveillance of work-related injury.
Three approaches were used to interrogate an injury description text field to classify cases as work-related: keyword search, index search, and content analytic text mining. Sensitivity and specificity were examined by comparing cases flagged by each approach to cases coded with an Activity code during triage. Methods to improve the sensitivity and/or specificity of each approach were explored by adjusting the classification techniques within each broad approach.
The basic keyword search detected 58% of cases (Specificity 0.99), an index search detected 62% of cases (Specificity 0.87), and the content analytic text mining (using adjusted probabilities) approach detected 77% of cases (Specificity 0.95).
The findings of this study provide strong support for continued development of text searching methods to obtain information from routine emergency department data, to improve the capacity for comprehensive injury surveillance.
据估计,澳大利亚每年与工作相关的受伤成本约为 575 亿美元,但目前缺乏足够的监测数据来支持基于证据的公共卫生应对措施。澳大利亚的急诊部(ED)是与工作相关的受伤信息的潜在来源,但大多数 ED 没有“活动代码”来识别与工作相关的病例,而有关就诊问题的信息则记录在简短的自由文本字段中。本研究比较了用于在急诊部门查询文本字段以识别与工作相关的受伤情况的方法,以为工作相关伤害监测的方法提供信息。
使用三种方法来查询伤害描述文本字段以将病例分类为与工作相关:关键词搜索、索引搜索和内容分析文本挖掘。通过将每种方法标记的病例与分诊期间使用活动代码编码的病例进行比较,检查了敏感性和特异性。通过调整每种广泛方法内的分类技术,探索了提高每种方法的敏感性和/或特异性的方法。
基本关键词搜索检测到 58%的病例(特异性 0.99),索引搜索检测到 62%的病例(特异性 0.87),而内容分析文本挖掘(使用调整后的概率)方法检测到 77%的病例(特异性 0.95)。
本研究的结果为进一步开发文本搜索方法以从常规急诊部门数据中获取信息提供了有力支持,以提高全面伤害监测的能力。