Fred Hutchinson Cancer Center, 1100 Fairview Ave N, M3-B232, Seattle, Washington, 98109, USA.
Lifetime Data Anal. 2024 Jul;30(3):549-571. doi: 10.1007/s10985-024-09626-x. Epub 2024 May 28.
Risk stratification based on prediction models has become increasingly important in preventing and managing chronic diseases. However, due to cost- and time-limitations, not every population can have resources for collecting enough detailed individual-level information on a large number of people to develop risk prediction models. A more practical approach is to use prediction models developed from existing studies and calibrate them with relevant summary-level information of the target population. Many existing studies were conducted under the population-based case-control design. Gail et al. (J Natl Cancer Inst 81:1879-1886, 1989) proposed to combine the odds ratio estimates obtained from case-control data and the disease incidence rates from the target population to obtain the baseline hazard function, and thereby the pure risk for developing diseases. However, the approach requires the risk factor distribution of cases from the case-control studies be same as the target population, which, if violated, may yield biased risk estimation. In this article, we propose two novel weighted estimating equation approaches to calibrate the baseline risk by leveraging the summary information of (some) risk factors in addition to disease-free probabilities from the targeted population. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies and an application to colorectal cancer studies demonstrate the proposed estimators perform well for bias reduction in finite samples.
基于预测模型的风险分层在预防和管理慢性病方面变得越来越重要。然而,由于成本和时间的限制,并非每个人群都有资源收集大量人群的足够详细的个体水平信息来开发风险预测模型。一种更实用的方法是使用来自现有研究的预测模型,并使用目标人群的相关汇总水平信息对其进行校准。许多现有研究是在基于人群的病例对照设计下进行的。Gail 等人(J Natl Cancer Inst 81:1879-1886, 1989)提出将从病例对照数据中获得的比值比估计值与目标人群的疾病发病率相结合,以获得基线风险函数,从而获得疾病发展的纯风险。然而,该方法要求病例对照研究中的风险因素分布与目标人群相同,如果违反这一要求,可能会导致风险估计偏倚。在本文中,我们提出了两种新的加权估计方程方法,通过利用目标人群的疾病无风险概率以及部分风险因素的汇总信息来校准基线风险。我们证明了所提出的估计量的一致性和渐近正态性。广泛的模拟研究和对结直肠癌研究的应用表明,所提出的估计量在有限样本中可很好地减少偏差。