Emerson S S, Emerson J C
Arizona Cancer Center, Tucson 85724.
Stat Med. 1993 Jan 15;12(1):3-12. doi: 10.1002/sim.4780120103.
When comparing the disease incidence rates for several subpopulations, epidemiologists often use direct standardization to adjust for potential confounding variables. In population-based studies, however, the data are often incompletely classified with respect to membership in the subpopulations of interest. In such a situation, one often assumes that the cases with missing data have the same distribution as the complete cases, that is the data are missing completely at random. In this setting, we derive variance estimates for the directly standardized rates which account for the use of incomplete data. We illustrate the use of these methods with data from a study of the incidence of gastrointestinal cancer by immigrant status where birthplace data are often incomplete.
在比较几个亚人群的疾病发病率时,流行病学家经常使用直接标准化来调整潜在的混杂变量。然而,在基于人群的研究中,关于感兴趣亚人群成员身份的数据往往分类不完整。在这种情况下,人们通常假设缺失数据的病例与完整病例具有相同的分布,即数据是完全随机缺失的。在此背景下,我们推导出了考虑使用不完整数据的直接标准化率的方差估计值。我们用一项关于按移民身份划分的胃肠癌发病率研究的数据来说明这些方法的应用,在该研究中出生地数据往往不完整。