Li Wenjun, Stanek Edward J, Bertone-Johnson Elizabeth R
Division of Preventive and Behavioral Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA.
Epidemiol Perspect Innov. 2008 Jan 25;5:2. doi: 10.1186/1742-5573-5-2.
Adjustment for covariates (also called auxiliary variables in survey sampling literature) is commonly applied in health surveys to reduce the variances of the prevalence estimators. In theory, adjusted prevalence estimators are more accurate when variance components are known. In practice, variance components needed to achieve the adjustment are unknown and their sample estimators are used instead. The uncertainty introduced by estimating variance components may overshadow the reduction in the variance of the prevalence estimators due to adjustment. We present empirical guidelines indicating when adjusted prevalence estimators should be considered, using gender adjusted and unadjusted smoking prevalence as an illustration.
We compare the accuracy of adjusted and unadjusted prevalence estimators via simulation. We simulate simple random samples from hypothetical populations with the proportion of males ranging from 30% to 70%, the smoking prevalence ranging from 15% to 35%, and the ratio of male to female smoking prevalence ranging from 1 to 4. The ranges of gender proportions and smoking prevalences reflect the conditions in 1999-2003 Behavioral Risk Factors Surveillance System (BRFSS) data for Massachusetts. From each population, 10,000 samples are selected and the ratios of the variance of the adjusted prevalence estimators to the variance of the unadjusted (crude) ones are computed and plotted against the proportion of males by population prevalence, as well as by population and sample sizes. The prevalence ratio thresholds, above which adjusted prevalence estimators have smaller variances, are determined graphically.
In many practical settings, gender adjustment results in less accuracy. Whether or not there is better accuracy with adjustment depends on sample sizes, gender proportions and ratios between male and female prevalences. In populations with equal number of males and females and smoking prevalence of 20%, the adjusted prevalence estimators are more accurate when the ratios of male to female prevalences are above 2.4, 1.8, 1.6, 1.4 and 1.3 for sample sizes of 25, 50, 100, 150 and 200, respectively.
Adjustment for covariates will not result in more accurate prevalence estimator when ratio of male to female prevalences is close to one, sample size is small and risk factor prevalence is low. For example, when reporting smoking prevalence based on simple random sampling, gender adjustment is recommended only when sample size is greater than 200.
在健康调查中,通常会对协变量(在抽样调查文献中也称为辅助变量)进行调整,以降低患病率估计值的方差。理论上,当方差分量已知时,调整后的患病率估计值更准确。在实际应用中,进行调整所需的方差分量是未知的,因此使用其样本估计值来代替。由于估计方差分量而引入的不确定性可能会掩盖因调整而导致的患病率估计值方差的减小。我们通过以性别调整和未调整的吸烟患病率为例,给出了何时应考虑使用调整后的患病率估计值的经验准则。
我们通过模拟比较调整后的患病率估计值和未调整的患病率估计值的准确性。我们从假设总体中模拟简单随机样本,其中男性比例范围为30%至70%,吸烟患病率范围为15%至35%,男性与女性吸烟患病率之比范围为1至4。性别比例和吸烟患病率的范围反映了1999 - 2003年马萨诸塞州行为危险因素监测系统(BRFSS)数据中的情况。从每个总体中选取10,000个样本,并计算调整后的患病率估计值的方差与未调整(粗)估计值的方差之比,并根据总体患病率、总体规模和样本规模绘制该比值与男性比例的关系图。通过图形确定调整后的患病率估计值方差较小的患病率比阈值。
在许多实际情况下,性别调整会导致准确性降低。调整后是否具有更高的准确性取决于样本规模、性别比例以及男性和女性患病率之间的比率。在男性和女性数量相等且吸烟患病率为20%的总体中,当样本规模分别为25、50、100、150和200时,男性与女性患病率之比分别高于2.4、1.8、1.6、1.4和1.3时,调整后的患病率估计值更准确。
当男性与女性患病率之比接近1、样本规模较小且危险因素患病率较低时,对协变量进行调整不会导致更准确的患病率估计值。例如,在基于简单随机抽样报告吸烟患病率时,仅当样本规模大于200时才建议进行性别调整。