Friesen Melissa C, Lan Qing, Ge Calvin, Locke Sarah J, Hosgood Dean, Fritschi Lin, Sadkowsky Troy, Chen Yu-Cheng, Wei Hu, Xu Jun, Lam Tai Hing, Kwong Yok Lam, Chen Kexin, Xu Caigang, Su Yu-Chieh, Chiu Brian C H, Ip Kai Ming Dennis, Purdue Mark P, Bassig Bryan A, Rothman Nat, Vermeulen Roel
1.Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, North Bethesda, MD 20980, USA;
2.University of Utrecht, Utrecht, The Netherlands;
Ann Occup Hyg. 2016 Aug;60(7):885-99. doi: 10.1093/annhyg/mew029. Epub 2016 Jun 1.
In community-based epidemiological studies, job- and industry-specific 'modules' are often used to systematically obtain details about the subject's work tasks. The module assignment is often made by the interviewer, who may have insufficient occupational hygiene knowledge to assign the correct module. We evaluated, in the context of a case-control study of lymphoid neoplasms in Asia ('AsiaLymph'), the performance of an algorithm that provided automatic, real-time module assignment during a computer-assisted personal interview.
AsiaLymph's occupational component began with a lifetime occupational history questionnaire with free-text responses and three solvent exposure screening questions. To assign each job to one of 23 study-specific modules, an algorithm automatically searched the free-text responses to the questions 'job title' and 'product made or services provided by employer' using a list of module-specific keywords, comprising over 5800 keywords in English, Traditional and Simplified Chinese. Hierarchical decision rules were used when the keyword match triggered multiple modules. If no keyword match was identified, a generic solvent module was assigned if the subject responded 'yes' to any of the three solvent screening questions. If these question responses were all 'no', a work location module was assigned, which redirected the subject to the farming, teaching, health professional, solvent, or industry solvent modules or ended the questions for that job, depending on the location response. We conducted a reliability assessment that compared the algorithm-assigned modules to consensus module assignments made by two industrial hygienists for a subset of 1251 (of 11409) jobs selected using a stratified random selection procedure using module-specific strata. Discordant assignments between the algorithm and consensus assignments (483 jobs) were qualitatively reviewed by the hygienists to evaluate the potential information lost from missed questions with using the algorithm-assigned module (none, low, medium, high).
The most frequently assigned modules were the work location (33%), solvent (20%), farming and food industry (19%), and dry cleaning and textile industry (6.4%) modules. In the reliability subset, the algorithm assignment had an exact match to the expert consensus-assigned module for 722 (57.7%) of the 1251 jobs. Overall, adjusted for the proportion of jobs in each stratum, we estimated that 86% of the algorithm-assigned modules would result in no information loss, 2% would have low information loss, and 12% would have medium to high information loss. Medium to high information loss occurred for <10% of the jobs assigned the generic solvent module and for 21, 32, and 31% of the jobs assigned the work location module with location responses of 'someplace else', 'factory', and 'don't know', respectively. Other work location responses had ≤8% with medium to high information loss because of redirections to other modules. Medium to high information loss occurred more frequently when a job description matched with multiple keywords pointing to different modules (29-69%, depending on the triggered assignment rule).
These evaluations demonstrated that automatically assigned modules can reliably reproduce an expert's module assignment without the direct involvement of an industrial hygienist or interviewer. The feasibility of adapting this framework to other studies will be language- and exposure-specific.
在基于社区的流行病学研究中,特定工作和行业的“模块”常被用于系统获取受试者工作任务的详细信息。模块分配通常由访谈者进行,而访谈者可能缺乏足够的职业卫生知识来分配正确的模块。在亚洲淋巴瘤病例对照研究(“AsiaLymph”)的背景下,我们评估了一种在计算机辅助个人访谈期间提供自动、实时模块分配的算法的性能。
AsiaLymph的职业部分始于一份包含自由文本回答的终生职业史问卷以及三个溶剂暴露筛查问题。为了将每份工作分配到23个特定研究模块中的一个,一种算法使用特定模块的关键词列表自动搜索对“职位名称”和“雇主生产的产品或提供的服务”问题的自由文本回答,该列表包含超过5800个英文、繁体中文和简体中文关键词。当关键词匹配触发多个模块时,使用分层决策规则。如果未识别到关键词匹配,若受试者对三个溶剂筛查问题中的任何一个回答“是”,则分配一个通用溶剂模块。如果这些问题的回答均为“否”,则分配一个工作地点模块,该模块会根据地点回答将受试者重定向到农业、教学、卫生专业人员、溶剂或行业溶剂模块,或者结束该工作的问题。我们进行了一项可靠性评估,将算法分配的模块与两名工业卫生学家针对使用特定模块分层的分层随机选择程序从11409份工作中选出的1251份工作子集达成的一致模块分配进行比较。卫生学家对算法与一致分配之间不一致的分配(483份工作)进行了定性审查,以评估使用算法分配的模块因遗漏问题而可能损失的信息(无、低、中、高)。
分配最频繁的模块是工作地点模块(33%)、溶剂模块(20%)、农业和食品行业模块(19%)以及干洗和纺织行业模块(6.4%)。在可靠性子集中,算法分配与专家一致分配的模块在1251份工作中的722份(57.7%)上完全匹配。总体而言,根据各层工作的比例进行调整后,我们估计算法分配的模块中有86%不会导致信息损失,2%会有低信息损失,12%会有中到高信息损失。对于分配通用溶剂模块的工作,<10%出现中到高信息损失;对于分配工作地点模块且地点回答为“其他地方”、“工厂”和“不知道”的工作,分别有21%、32%和31%出现中到高信息损失。由于重定向到其他模块,其他工作地点回答出现中到高信息损失的比例≤8%。当工作描述与指向不同模块的多个关键词匹配时,中到高信息损失更频繁出现(29 - 69%,取决于触发的分配规则)。
这些评估表明,自动分配的模块能够在无需工业卫生学家或访谈者直接参与的情况下可靠地重现专家的模块分配。将此框架应用于其他研究的可行性将因语言和暴露情况而异。