Department of Population Health, New York University Grossman School of Medicine, New York, NY, 10016, USA.
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
BMC Med Res Methodol. 2023 May 19;23(1):119. doi: 10.1186/s12874-023-01950-4.
Sub-cohort sampling designs such as a case-cohort study play a key role in studying biomarker-disease associations due to their cost effectiveness. Time-to-event outcome is often the focus in cohort studies, and the research goal is to assess the association between the event risk and risk factors. In this paper, we propose a novel goodness-of-fit two-phase sampling design for time-to-event outcomes when some covariates (e.g., biomarkers) can only be measured on a subgroup of study subjects.
Assuming that an external model, which can be the well-established risk models such as the Gail model for breast cancer, Gleason score for prostate cancer, and Framingham risk models for heart diseases, or built from preliminary data, is available to relate the outcome and complete covariates, we propose to oversample subjects with worse goodness-of-fit (GOF) based on an external survival model and time-to-event. With the cases and controls sampled using the GOF two-phase design, the inverse sampling probability weighting method is used to estimate the log hazard ratio of both incomplete and complete covariates. We conducted extensive simulations to evaluate the efficiency gain of our proposed GOF two-phase sampling designs over case-cohort study designs.
Through extensive simulations based on a dataset from the New York University Women's Health Study, we showed that the proposed GOF two-phase sampling designs were unbiased and generally had higher efficiency compared to the standard case-cohort study designs.
In cohort studies with rare outcomes, an important design question is how to select informative subjects to reduce sampling costs while maintaining statistical efficiency. Our proposed goodness-of-fit two-phase design provides efficient alternatives to standard case-cohort designs for assessing the association between time-to-event outcome and risk factors. This method is conveniently implemented in standard software.
子群组抽样设计,如病例-队列研究,由于其成本效益,在研究生物标志物与疾病的关联方面发挥着关键作用。 时间事件结局通常是队列研究的重点,研究目标是评估事件风险与风险因素之间的关联。 在本文中,我们提出了一种新的适用于时间事件结局的拟合优度两阶段抽样设计,其中一些协变量(例如生物标志物)只能在研究对象的子组中测量。
假设存在一个外部模型,该模型可以是已建立的风险模型,如乳腺癌的 Gail 模型、前列腺癌的 Gleason 评分和心脏病的 Framingham 风险模型,也可以是基于初步数据建立的模型,用于关联结局和完整的协变量。 我们建议根据外部生存模型和时间事件对拟合不良(GOF)的受试者进行过采样。 使用 GOF 两阶段设计抽样的病例和对照后,使用逆抽样概率加权法估计不完全和完整协变量的对数危险比。 我们进行了广泛的模拟,以评估我们提出的 GOF 两阶段抽样设计相对于病例-队列研究设计的效率增益。
通过基于纽约大学女性健康研究数据集的广泛模拟,我们表明,与标准病例-队列研究设计相比,所提出的 GOF 两阶段抽样设计具有无偏性和更高的效率。
在罕见结局的队列研究中,一个重要的设计问题是如何选择信息丰富的受试者,以降低采样成本,同时保持统计效率。 我们提出的拟合优度两阶段设计为评估时间事件结局与风险因素之间的关联提供了标准病例-队列设计的有效替代方案。 该方法可方便地在标准软件中实现。