Spittal Matthew J, Carlin John B, Currier Dianne, Downes Marnie, English Dallas R, Gordon Ian, Pirkis Jane, Gurrin Lyle
Centre for Mental Health, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, 3010, Australia.
Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, 3010, Australia.
BMC Public Health. 2016 Oct 31;16(Suppl 3):1062. doi: 10.1186/s12889-016-3699-0.
The Australian Longitudinal Study on Male Health (Ten to Men) used a complex sampling scheme to identify potential participants for the baseline survey. This raises important questions about when and how to adjust for the sampling design when analyzing data from the baseline survey.
We describe the sampling scheme used in Ten to Men focusing on four important elements: stratification, multi-stage sampling, clustering and sample weights. We discuss how these elements fit together when using baseline data to estimate a population parameter (e.g., population mean or prevalence) or to estimate the association between an exposure and an outcome (e.g., an odds ratio). We illustrate this with examples using a continuous outcome (weight in kilograms) and a binary outcome (smoking status).
Estimates of a population mean or disease prevalence using Ten to Men baseline data are influenced by the extent to which the sampling design is addressed in an analysis. Estimates of mean weight and smoking prevalence are larger in unweighted analyses than weighted analyses (e.g., mean = 83.9 kg vs. 81.4 kg; prevalence = 18.0 % vs. 16.7 %, for unweighted and weighted analyses respectively) and the standard error of the mean is 1.03 times larger in an analysis that acknowledges the hierarchical (clustered) structure of the data compared with one that does not. For smoking prevalence, the corresponding standard error is 1.07 times larger. Measures of association (mean group differences, odds ratios) are generally similar in unweighted or weighted analyses and whether or not adjustment is made for clustering.
The extent to which the Ten to Men sampling design is accounted for in any analysis of the baseline data will depend on the research question. When the goals of the analysis are to estimate the prevalence of a disease or risk factor in the population or the magnitude of a population-level exposure-outcome association, our advice is to adopt an analysis that respects the sampling design.
澳大利亚男性健康纵向研究(十岁至成年男性)采用了复杂的抽样方案来确定基线调查的潜在参与者。这就引发了一些重要问题,即在分析基线调查数据时,何时以及如何对抽样设计进行调整。
我们描述了十岁至成年男性研究中使用的抽样方案,重点关注四个重要因素:分层、多阶段抽样、聚类和样本权重。我们讨论了在使用基线数据估计总体参数(如总体均值或患病率)或估计暴露与结局之间的关联(如比值比)时,这些因素是如何相互配合的。我们用连续结局(以千克为单位的体重)和二元结局(吸烟状况)的例子来说明这一点。
使用十岁至成年男性基线数据对总体均值或疾病患病率的估计会受到分析中考虑抽样设计程度的影响。未加权分析中的平均体重和吸烟患病率估计值高于加权分析(例如,未加权和加权分析的平均体重分别为83.9千克和81.4千克;患病率分别为18.0%和16.7%),并且与未考虑数据分层(聚类)结构的分析相比,考虑了数据分层(聚类)结构的分析中均值的标准误大1.03倍。对于吸烟患病率,相应的标准误大1.07倍。未加权或加权分析以及是否对聚类进行调整时,关联度量(平均组间差异、比值比)通常相似。
在对基线数据的任何分析中,考虑十岁至成年男性抽样设计的程度将取决于研究问题。当分析的目标是估计人群中疾病或危险因素的患病率或人群水平暴露 - 结局关联的大小,我们的建议是采用尊重抽样设计的分析方法。