Lyles Robert H, Mitchell Emily M, Weinberg Clarice R, Umbach David M, Schisterman Enrique F
Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Mailstop 1518-002-3AA, Atlanta, Georgia 30322, U.S.A..
Epidemiology Branch, Division of Intramural Population Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, U.S.A.
Biometrics. 2016 Sep;72(3):965-75. doi: 10.1111/biom.12489. Epub 2016 Mar 9.
Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718-726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call "(y,c)-pooling," which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure.
通过合并等量的生物样本等分试样来降低实验室检测成本,这在疾病监测和流行病学研究中早已得到认可,并且最近在回归设置中推动了设计和分析的发展。例如,Weinberg和Umbach(1999年,《生物统计学》55卷,718 - 726页)提供了在对合并样本进行连续暴露变量(例如生物标志物)检测时,将基于集合的逻辑回归模型拟合到病例对照数据的方法。我们专注于通过在样本池分配阶段利用可用的个体特定信息来提高估计效率。我们发现一种我们称为“(y,c) - 合并”的策略,即在由结果和其他协变量共同定义的层内形成个体的合并集,与仅由结果定义的层内合并相比,能更精确地估计与这些协变量相关的风险参数。我们通过Weinberg和Umbach在最近对其原始论文的修正中开发的偏移量来回顾基于集合的分析方法。我们提出了一种在此设计下的方差估计方法,并使用模拟和一个实际数据示例来说明(y,c) - 合并相对于y - 合并在精度上的优势。我们还指出并举例说明基于集合的模型允许估计协变量与暴露之间的相互作用。