Department of Mathematics and Statistics, University of Calgary, Calgary, Canada.
BMC Med Res Methodol. 2012 Sep 4;12:135. doi: 10.1186/1471-2288-12-135.
In epidemiological studies, it is often not possible to measure accurately exposures of participants even if their response variable can be measured without error. When there are several groups of subjects, occupational epidemiologists employ group-based strategy (GBS) for exposure assessment to reduce bias due to measurement errors: individuals of a group/job within study sample are assigned commonly to the sample mean of exposure measurements from their group in evaluating the effect of exposure on the response. Therefore, exposure is estimated on an ecological level while health outcomes are ascertained for each subject. Such study design leads to negligible bias in risk estimates when group means are estimated from 'large' samples. However, in many cases, only a small number of observations are available to estimate the group means, and this causes bias in the observed exposure-disease association. Also, the analysis in a semi-ecological design may involve exposure data with the majority missing and the rest observed with measurement errors and complete response data collected with ascertainment.
In workplaces groups/jobs are naturally ordered and this could be incorporated in estimation procedure by constrained estimation methods together with the expectation and maximization (EM) algorithms for regression models having measurement error and missing values. Four methods were compared by a simulation study: naive complete-case analysis, GBS, the constrained GBS (CGBS), and the constrained expectation and maximization (CEM). We illustrated the methods in the analysis of decline in lung function due to exposures to carbon black.
Naive and GBS approaches were shown to be inadequate when the number of exposure measurements is too small to accurately estimate group means. The CEM method appears to be best among them when within each exposure group at least a 'moderate' number of individuals have their exposures observed with error. However, compared with CEM, CGBS is easier to implement and has more desirable bias-reducing properties in the presence of substantial proportions of missing exposure data.
The CGBS approach could be useful for estimating exposure-disease association in semi-ecological studies when the true group means are ordered and the number of measured exposures in each group is small. These findings have important implication for cost-effective design of semi-ecological studies because they enable investigators to more reliably estimate exposure-disease associations with smaller exposure measurement campaign than with the analytical methods that were historically employed.
在流行病学研究中,即使可以准确测量反应变量,也常常无法准确测量参与者的暴露情况。当存在多个组的对象时,职业流行病学家采用基于群组的策略(GBS)进行暴露评估,以减少因测量误差导致的偏倚:在研究样本中,一组/职业的个体被普遍分配到其组内暴露测量值的样本均值,以评估暴露对反应的影响。因此,暴露是在生态水平上进行估计,而健康结果则是针对每个个体确定的。当从“大”样本中估计组均值时,这种研究设计会导致风险估计值的偏差可以忽略不计。然而,在许多情况下,只有少数观察值可用于估计组均值,这会导致观察到的暴露与疾病关联的偏倚。此外,半生态设计中的分析可能涉及暴露数据大部分缺失,其余部分存在测量误差,而完整的响应数据则是通过确定收集的。
在工作场所中,群组/职业是自然有序的,这可以通过约束估计方法与回归模型的期望最大化(EM)算法相结合,在估计过程中加以考虑,这些回归模型存在测量误差和缺失值。通过模拟研究比较了四种方法:幼稚的完整病例分析、GBS、约束 GBS(CGBS)和约束期望最大化(CEM)。我们在因暴露于炭黑而导致肺功能下降的分析中说明了这些方法。
当暴露测量的数量太少而无法准确估计组均值时,幼稚和 GBS 方法显示不足。在每个暴露组中至少有“中等”数量的个体的暴露值存在误差时,CEM 方法似乎是其中最好的。然而,与 CEM 相比,CGBS 在存在大量缺失暴露数据的情况下更容易实施,并且具有更理想的减少偏倚的特性。
当真实的组均值有序且每个组中的测量暴露数量较少时,CGBS 方法可用于半生态研究中估计暴露与疾病的关联。这些发现对于半生态研究的经济有效的设计具有重要意义,因为它们使研究人员能够通过比历史上使用的分析方法更小的暴露测量活动来更可靠地估计暴露与疾病的关联。