Lemeshow S, Letenneur L, Dartigues J F, Lafont S, Orgogozo J M, Commenges D
Department of Biostatistics and Epidemiology, School of Public Health, University of Massachusetts, Amherst 01003, USA.
Am J Epidemiol. 1998 Aug 1;148(3):298-306. doi: 10.1093/oxfordjournals.aje.a009639.
Epidemiologists are increasingly looking to large-scale sample surveys to provide data for studies of the associations between known or suspected risk factors and disease. More often than not, widely available statistical software packages have been used to analyze such data, particularly when multivariable modeling is involved. Such packages assume that the data have resulted from simple random samples. However, when the survey design incorporates such features as clustering and stratification, the results of statistical analyses based on this assumption can be incorrect. The authors utilized data from the PAQUID (Personnes Agees Quid) study, collected periodically from 1988 to 1996, to illustrate the ease of performing a "design-based" (vs. a "model-based") analysis of complex survey data, and they compared the results obtained using both approaches. The PAQUID study is a stratified cluster sample of elderly community residents in the southwestern departments of Gironde and Dordogne, France. In the illustration presented-in which 3,777 community residents aged 65 years or older were selected to permit identification of baseline and lifetime factors that might be related to cognitive loss, dementia, and Alzheimer's disease--measures of association (such as odds ratios and their associated standard errors) were comparable for both analytical strategies. However, this may not be the case for other examples. Descriptive measures (such as estimates of means and proportions) may be more seriously compromised by the decision to ignore the sampling design. The availability of modern statistical packages with survey analysis capabilities should encourage data analysts to perform design-based analyses whenever possible.
流行病学家越来越多地借助大规模抽样调查来获取数据,以研究已知或疑似风险因素与疾病之间的关联。通常情况下,人们会使用广泛可得的统计软件包来分析此类数据,尤其是在涉及多变量建模时。这些软件包假定数据来自简单随机样本。然而,当调查设计包含聚类和分层等特征时,基于这一假设的统计分析结果可能会出现偏差。作者利用了PAQUID(Personnes Agees Quid)研究的数据(该研究于1988年至1996年定期收集),来说明对复杂调查数据进行“基于设计”(与“基于模型”相对)分析的简便性,并比较了两种方法所得的结果。PAQUID研究是法国吉伦特省和多尔多涅省西南部老年社区居民的分层整群样本。在给出的示例中,选取了3777名65岁及以上的社区居民,以确定可能与认知衰退、痴呆症和阿尔茨海默病相关的基线和终生因素,两种分析策略的关联度量(如比值比及其相关标准误)具有可比性。然而,其他示例可能并非如此。描述性度量(如均值和比例估计)可能会因忽略抽样设计的决定而受到更严重的影响。具备调查分析功能的现代统计软件包的出现,应促使数据分析师尽可能进行基于设计的分析。