Friesen Melissa C, Shortreed Susan M, Wheeler David C, Burstyn Igor, Vermeulen Roel, Pronk Anjoeka, Colt Joanne S, Baris Dalsu, Karagas Margaret R, Schwenn Molly, Johnson Alison, Armenti Karla R, Silverman Debra T, Yu Kai
1.Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
2.Biostatistics, Group Health Research Institute, Seattle, WA 98101-1448, USA.
Ann Occup Hyg. 2015 May;59(4):455-66. doi: 10.1093/annhyg/meu101. Epub 2014 Dec 3.
OBJECTIVES: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. METHODS: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. RESULTS: Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. CONCLUSIONS: This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process.
目的:在基于人群的研究中,基于问卷回答模式的基于规则的专家暴露评估提高了决策的透明度。然而,独特回答模式的数量可能几乎与工作岗位的数量相等。专家可以使用专业意见减少需要评估的模式数量,但每个专家可能会识别出不同的回答模式来确定暴露情况。在此,提出层次聚类方法作为一种系统的数据简化步骤,以便在获得专家估计之前可重复地识别相似的问卷回答模式。作为概念验证,我们使用层次聚类方法识别对柴油废气相关问题有相似回答的工作岗位组(聚类),然后评估聚类中的工作岗位是否具有相似的(先前评估的)职业柴油废气暴露估计。 方法:以新英格兰膀胱癌研究为例,我们将层次聚类模型应用于从职业史以及特定工作和行业问卷(模块)中提取的与柴油相关的变量。针对两个子集分别开发聚类模型:(i)从职业史中提取≥1个变量表明存在潜在柴油暴露情况但没有与柴油相关问题模块的5395个工作岗位;(ii)既有职业史又有对柴油相关问题的模块回答的5929个工作岗位。对于每个子集,我们将从为每个模型开发的聚类树中提取的聚类数量从100组工作岗位变化到1000组工作岗位。利用先前对职业暴露于柴油废气的概率(序数)、强度(每立方米可吸入元素碳微克数)和频率(每周小时数)的估计,我们通过两种方式检查同一聚类中工作岗位的暴露估计的相似性。首先,与二分概率估计(<5对≥5%;<50对≥50%)相比,检查聚类的同质性(定义为>75%具有相同估计)。其次,对于序数概率指标以及连续强度和频率指标,我们计算每个工作岗位的估计与聚类中所有工作岗位的平均估计之间的组内相关系数(ICC)。 结果:使用更多聚类时,聚类内同质性增加。例如,使用500个聚类时,≥80%的聚类是同质的。同样,当使用≥200个聚类时,ICC通常高于0.7,表明聚类内变异性最小。频率指标观察到的聚类内变异性最大(ICC为0.4至0.8)。我们估计,让专家在聚类级别进行暴露分配然后审查非同质聚类中的每个工作岗位,每位专家大约需要做出2000个决策,相比之下,评估4255种独特的问卷模式或14983个单个工作岗位。 结论:这一概念验证表明,在获得专家评级之前,使用聚类模型作为数据简化步骤来识别具有相似回答模式的工作岗位,有可能通过系统地减少所需的暴露决策数量来辅助基于规则的评估。虽然前景乐观,但需要进一步研究来量化暴露决策的实际减少量以及在将聚类级别专家评估作为评估过程一部分的暴露评估工作中聚类内暴露估计的同质性。
Ann Work Expo Health. 2019-10-11
J Occup Environ Hyg. 2016-7
Curr Environ Health Rep. 2019-9
Ann Work Expo Health. 2018-8-13
Int J Hyg Environ Health. 2017-11
Occup Environ Med. 2013-10-24
Occup Environ Med. 2012-1-2
Environ Health Perspect. 2011-2