Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, SW7 2AZ, United Kingdom.
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, xUnited Kingdom.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad010.
Researchers interested in understanding the relationship between a readily available longitudinal binary outcome and a novel biomarker exposure can be confronted with ascertainment costs that limit sample size. In such settings, two-phase studies can be cost-effective solutions that allow researchers to target informative individuals for exposure ascertainment and increase estimation precision for time-varying and/or time-fixed exposure coefficients. In this paper, we introduce a novel class of residual-dependent sampling (RDS) designs that select informative individuals using data available on the longitudinal outcome and inexpensive covariates. Together with the RDS designs, we propose a semiparametric analysis approach that efficiently uses all data to estimate the parameters. We describe a numerically stable and computationally efficient EM algorithm to maximize the semiparametric likelihood. We examine the finite sample operating characteristics of the proposed approaches through extensive simulation studies, and compare the efficiency of our designs and analysis approach with existing ones. We illustrate the usefulness of the proposed RDS designs and analysis method in practice by studying the association between a genetic marker and poor lung function among patients enrolled in the Lung Health Study (Connett et al, 1993).
研究人员若希望了解易得的纵向二分类结局与新生物标志物暴露之间的关系,可能会面临限制样本量的检出成本。在这种情况下,两阶段研究可能是一种具有成本效益的解决方案,它允许研究人员针对有信息的个体进行暴露检出,并提高时变和/或时定暴露系数的估计精度。在本文中,我们提出了一类新的基于残差的抽样(RDS)设计,该设计使用纵向结局和廉价的协变量上可得的数据来选择有信息的个体。我们提出了一种半参数分析方法,该方法与 RDS 设计一起,有效地利用所有数据来估计参数。我们描述了一种数值稳定且计算高效的 EM 算法,以最大化半参数似然。我们通过广泛的模拟研究来检验所提出方法的有限样本工作特性,并将我们的设计和分析方法的效率与现有方法进行比较。我们通过研究肺健康研究(Connett 等人,1993)中登记的患者的遗传标记与肺功能不良之间的关联,说明了所提出的 RDS 设计和分析方法在实践中的有用性。