Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA.
Stat Med. 2018 Jul 10;37(15):2321-2337. doi: 10.1002/sim.7672. Epub 2018 Apr 22.
Outcome-dependent sampling (ODS) scheme is a cost-effective way to conduct a study. For a study with continuous primary outcome, an ODS scheme can be implemented where the expensive exposure is only measured on a simple random sample and supplemental samples selected from 2 tails of the primary outcome variable. With the tremendous cost invested in collecting the primary exposure information, investigators often would like to use the available data to study the relationship between a secondary outcome and the obtained exposure variable. This is referred as secondary analysis. Secondary analysis in ODS designs can be tricky, as the ODS sample is not a random sample from the general population. In this article, we use the inverse probability weighted and augmented inverse probability weighted estimating equations to analyze the secondary outcome for data obtained from the ODS design. We do not make any parametric assumptions on the primary and secondary outcome and only specify the form of the regression mean models, thus allow an arbitrary error distribution. Our approach is robust to second- and higher-order moment misspecification. It also leads to more precise estimates of the parameters by effectively using all the available participants. Through simulation studies, we show that the proposed estimator is consistent and asymptotically normal. Data from the Collaborative Perinatal Project are analyzed to illustrate our method.
基于结果的抽样(ODS)方案是进行研究的一种具有成本效益的方法。对于具有连续主要结局的研究,可以实施 ODS 方案,其中昂贵的暴露仅在简单随机样本上进行测量,并从主要结局变量的 2 个尾部选择补充样本。由于在收集主要暴露信息方面投入了大量成本,研究人员通常希望利用现有数据研究次要结局与获得的暴露变量之间的关系。这被称为二次分析。在 ODS 设计中的二次分析可能很棘手,因为 ODS 样本不是总体人群中的随机样本。在本文中,我们使用逆概率加权和增强逆概率加权估计方程来分析从 ODS 设计中获得的数据的次要结局。我们对主要和次要结局没有任何参数假设,仅指定回归均值模型的形式,因此允许任意误差分布。我们的方法对二阶和更高阶矩的指定不敏感。通过有效利用所有可用的参与者,它还可以更准确地估计参数。通过仿真研究,我们证明了所提出的估计量是一致的和渐近正态的。对合作围产期项目的数据进行了分析,以说明我们的方法。