Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA.
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
Stat Med. 2022 Sep 30;41(22):4403-4425. doi: 10.1002/sim.9516. Epub 2022 Jul 7.
Large cohort studies now routinely involve biobanks in which biospecimens are stored for use in future biomarker studies. In such settings, two-phase response-dependent sampling designs involve subsampling individuals in the cohort, assaying their biospecimen to measure an expensive biomarker, and using this data to estimate key parameters of interest under budgetary constraints. When analyses are based on inverse probability weighted estimating functions, recent work has described adaptive two-phase designs in which a preliminary phase of subsampling based on a standard design facilitates approximation of an optimal selection model for a second subsampling phase. In this article, we refine the definition of an optimal subsampling scheme within the framework of adaptive two-phase designs, describe how adaptive two-phase designs can be used when analyses are based on likelihood or conditional likelihood, and consider the setting of a continuous biomarker where the nuisance covariate distribution is estimated nonparametrically at the design stage and analysis stage as required; efficiency and robustness issues are investigated. We also explore these methods for the surrogate variable problem and describe a generalization to accommodate multiple stages of phase II subsampling. A study involving individuals with psoriatic arthritis is considered for illustration, where the aim is to assess the association between the biomarker MMP-3 and the development of joint damage.
现在,大型队列研究通常涉及生物库,其中储存生物样本,用于未来的生物标志物研究。在这种情况下,两阶段响应依赖抽样设计涉及对队列中的个体进行亚抽样,检测他们的生物样本以测量昂贵的生物标志物,并利用这些数据在预算限制下估计感兴趣的关键参数。当分析基于逆概率加权估计函数时,最近的工作描述了自适应两阶段设计,其中基于标准设计的初步亚抽样阶段有助于为第二阶段亚抽样的最佳选择模型进行近似。在本文中,我们在自适应两阶段设计框架内细化了最佳抽样方案的定义,描述了当分析基于似然或条件似然时如何使用自适应两阶段设计,并且考虑了连续生物标志物的情况,其中在设计阶段和分析阶段根据需要非参数估计混杂协变量分布;还研究了效率和稳健性问题。我们还探索了这些方法在替代变量问题中的应用,并描述了一种推广方法,以适应多个阶段的二期亚抽样。考虑了一项涉及银屑病关节炎患者的研究,旨在评估生物标志物 MMP-3 与关节损伤发展之间的关联。