School of Nursing, University of Delaware, Newark, DE 19716, USA.
Nurs Res. 2012 May-Jun;61(3):231-7. doi: 10.1097/NNR.0b013e3182533403.
The National Center for Health Statistics conducts the National Health and Nutrition Examination Survey and other national surveys with probability-based complex sample designs. Goals of national surveys are to provide valid data for the population of the United States. Analyses of data from population surveys present unique challenges in the research process but are valuable avenues to study the health of the United States population.
The aim of this study was to demonstrate the importance of using complex data analysis techniques for data obtained with complex multistage sampling design and provide an example of analysis using the SPSS Complex Samples procedure.
Illustration of challenges and solutions specific to secondary data analysis of national databases are described using the National Health and Nutrition Examination Survey as the exemplar.
Oversampling of small or sensitive groups provides necessary estimates of variability within small groups. Use of weights without complex samples accurately estimates population means and frequency from the sample after accounting for over- or undersampling of specific groups. Weighting alone leads to inappropriate population estimates of variability, because they are computed as if the measures were from the entire population rather than a sample in the data set. The SPSS Complex Samples procedure allows inclusion of all sampling design elements, stratification, clusters, and weights.
Use of national data sets allows use of extensive, expensive, and well-documented survey data for exploratory questions but limits analysis to those variables included in the data set. The large sample permits examination of multiple predictors and interactive relationships. Merging data files, availability of data in several waves of surveys, and complex sampling are techniques used to provide a representative sample but present unique challenges. In sophisticated data analysis techniques, use of these data is optimized.
国家卫生统计中心采用基于概率的复杂样本设计进行全国健康和营养调查及其他全国性调查。全国性调查的目标是为美国人口提供有效数据。人口调查数据分析在研究过程中提出了独特的挑战,但这是研究美国人口健康状况的有价值途径。
本研究旨在展示使用复杂数据分析技术处理复杂多阶段抽样设计所获得的数据的重要性,并举例说明如何使用 SPSS 复杂样本程序进行分析。
以全国健康和营养调查为例,阐述了特定于全国数据库二次数据分析的挑战和解决方案。
对小群体或敏感群体的过度抽样为小群体内的变异性提供了必要的估计。在考虑到特定群体的过度或欠抽样后,使用未经复杂样本调整的权重可以准确估计样本中的总体均值和频率。仅使用权重会导致对变异性的不适当的总体估计,因为它们是根据整个数据集而不是样本中的测量值计算的。SPSS 复杂样本程序允许包括所有抽样设计元素、分层、聚类和权重。
使用全国性数据集可以利用广泛、昂贵且记录完善的调查数据来解决探索性问题,但将分析限于数据集所包含的变量。大样本允许检查多个预测因子和交互关系。合并数据文件、调查的多个波次的数据可用性以及复杂的抽样都是提供代表性样本的技术,但提出了独特的挑战。在复杂的数据分析技术中,这些数据的使用得到了优化。