Burstyn Igor, Slutsky Anton, Lee Derrick G, Singer Alison B, An Yuan, Michael Yvonne L
1. Department of Environmental and Occupational Health, School of Public Health, Drexel University, Philadelphia, PA, USA.
Ann Occup Hyg. 2014 May;58(4):482-92. doi: 10.1093/annhyg/meu006. Epub 2014 Feb 6.
Epidemiologists typically collect narrative descriptions of occupational histories because these are less prone than self-reported exposures to recall bias of exposure to a specific hazard. However, the task of coding these narratives can be daunting and prohibitively time-consuming in some settings. The aim of this manuscript is to evaluate the performance of a computer algorithm to translate the narrative description of occupational codes into standard classification of jobs (2010 Standard Occupational Classification) in an epidemiological context. The fundamental question we address is whether exposure assignment resulting from manual (presumed gold standard) coding of the narratives is materially different from that arising from the application of automated coding. We pursued our work through three motivating examples: assessment of physical demands in Women's Health Initiative observational study, evaluation of predictors of exposure to coal tar pitch volatiles in the US Occupational Safety and Health Administration's (OSHA) Integrated Management Information System, and assessment of exposure to agents known to cause occupational asthma in a pregnancy cohort. In these diverse settings, we demonstrate that automated coding of occupations results in assignment of exposures that are in reasonable agreement with results that can be obtained through manual coding. The correlation between physical demand scores based on manual and automated job classification schemes was reasonable (r = 0.5). The agreement between predictive probability of exceeding the OSHA's permissible exposure level for polycyclic aromatic hydrocarbons, using coal tar pitch volatiles as a surrogate, based on manual and automated coding of jobs was modest (Kendall rank correlation = 0.29). In the case of binary assignment of exposure to asthmagens, we observed that fair to excellent agreement in classifications can be reached, depending on presence of ambiguity in assigned job classification (κ = 0.5-0.8). Thus, the success of automated coding appears to depend on the setting and type of exposure that is being assessed. Our overall recommendation is that automated translation of short narrative descriptions of jobs for exposure assessment is feasible in some settings and essential for large cohorts, especially if combined with manual coding to both assess reliability of coding and to further refine the coding algorithm.
流行病学家通常收集职业史的叙述性描述,因为与自我报告的暴露情况相比,这些描述较不易受到回忆特定危害暴露偏差的影响。然而,在某些情况下,对这些叙述进行编码的任务可能令人生畏且耗时过长。本手稿的目的是评估一种计算机算法在流行病学背景下将职业代码的叙述性描述转换为标准职业分类(2010年标准职业分类)的性能。我们要解决的基本问题是,对叙述进行人工编码(假定为金标准)所产生的暴露赋值与应用自动编码所产生的暴露赋值是否存在实质性差异。我们通过三个具有启发性的例子开展工作:在妇女健康倡议观察性研究中评估体力需求、在美国职业安全与健康管理局(OSHA)的综合管理信息系统中评估接触煤焦油沥青挥发物的预测因素,以及在一个妊娠队列中评估接触已知会导致职业性哮喘的物质的情况。在这些不同的场景中,我们证明职业的自动编码所产生的暴露赋值与通过人工编码获得的结果合理一致。基于人工和自动职业分类方案的体力需求得分之间的相关性合理(r = 0.5)。基于工作的人工和自动编码,以煤焦油沥青挥发物为替代物,超过OSHA多环芳烃允许暴露水平的预测概率之间的一致性一般(肯德尔等级相关 = 0.29)。在哮喘原暴露的二元赋值情况下,我们观察到根据所分配职业分类中是否存在模糊性,分类的一致性可达一般到良好(κ = 0.5 - 0.8)。因此,自动编码的成功似乎取决于所评估的暴露场景和类型。我们的总体建议是,在某些情况下,对用于暴露评估的简短工作叙述性描述进行自动翻译是可行的,对于大型队列来说是必不可少的,特别是如果与人工编码相结合,既能评估编码的可靠性,又能进一步完善编码算法。