Li Yan, Safaeian Mahboobeh, Robbins Hilary A, Graubard Barry I
Joint Program in Survey Methodology, University of Maryland, College Park, MD 20742, USA
Infections and Immunoepidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
Biostatistics. 2015 Jan;16(1):169-78. doi: 10.1093/biostatistics/kxu024. Epub 2014 Jun 6.
Epidemiologic cross-sectional, case-cohort, or case-control studies often select augmentation samples to supplement an existing (baseline) sample, primarily for the two reasons: (1) to increase the sample sizes from certain subdomains of interest that were not originally considered in the design of the baseline study and (2) to obtain samples from an extension of the target population. To address these two objectives, two-stage stratified sample designs are considered, where the stratification based on the expanded population at the second stage is not nested in the first stage strata. The sample weighting and Taylor linearization variance estimation for the two-stage stratified sample designs, involving re-stratification and population expansion, are provided for estimating population totals and logistic regression coefficients. Results from limited simulation studies and a logistic regression analysis of a study of human papillomavirus serology are provided.
流行病学横断面研究、病例队列研究或病例对照研究通常会选择扩充样本以补充现有的(基线)样本,主要有两个原因:(1)增加来自基线研究设计中最初未考虑的某些感兴趣子域的样本量;(2)从目标人群的扩展部分获取样本。为实现这两个目标,考虑了两阶段分层抽样设计,其中基于第二阶段扩大后的总体进行的分层并不嵌套于第一阶段的分层中。针对涉及重新分层和总体扩展的两阶段分层抽样设计,给出了样本加权和泰勒线性化方差估计方法,用于估计总体总量和逻辑回归系数。还提供了有限模拟研究的结果以及一项人乳头瘤病毒血清学研究的逻辑回归分析结果。