Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, CT, USA.
Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
JNCI Cancer Spectr. 2024 Feb 29;8(2). doi: 10.1093/jncics/pkae008.
Models with polygenic risk scores and clinical factors to predict risk of different cancers have been developed, but these models have been limited by the polygenic risk score-derivation methods and the incomplete selection of clinical variables.
We used UK Biobank to train the best polygenic risk scores for 8 cancers (bladder, breast, colorectal, kidney, lung, ovarian, pancreatic, and prostate cancers) and select relevant clinical variables from 733 baseline traits through extreme gradient boosting (XGBoost). Combining polygenic risk scores and clinical variables, we developed Cox proportional hazards models for risk prediction in these cancers.
Our models achieved high prediction accuracy for 8 cancers, with areas under the curve ranging from 0.618 (95% confidence interval = 0.581 to 0.655) for ovarian cancer to 0.831 (95% confidence interval = 0.817 to 0.845) for lung cancer. Additionally, our models could identify individuals at a high risk for developing cancer. For example, the risk of breast cancer for individuals in the top 5% score quantile was nearly 13 times greater than for individuals in the lowest 10%. Furthermore, we observed a higher proportion of individuals with high polygenic risk scores in the early-onset group but a higher proportion of individuals at high clinical risk in the late-onset group.
Our models demonstrated the potential to predict cancer risk and identify high-risk individuals with great generalizability to different cancers. Our findings suggested that the polygenic risk score model is more predictive for the cancer risk of early-onset patients than for late-onset patients, while the clinical risk model is more predictive for late-onset patients. Meanwhile, combining polygenic risk scores and clinical risk factors has overall better predictive performance than using polygenic risk scores or clinical risk factors alone.
已经开发出了使用多基因风险评分和临床因素来预测不同癌症风险的模型,但这些模型受到多基因风险评分推导方法和临床变量选择不完整的限制。
我们使用英国生物银行(UK Biobank)的数据来训练针对 8 种癌症(膀胱癌、乳腺癌、结直肠癌、肾癌、肺癌、卵巢癌、胰腺癌和前列腺癌)的最佳多基因风险评分,并通过极端梯度增强(XGBoost)从 733 个基线特征中选择相关的临床变量。我们将多基因风险评分和临床变量相结合,为这些癌症的风险预测开发了 Cox 比例风险模型。
我们的模型对 8 种癌症的预测准确率较高,曲线下面积(AUC)范围从卵巢癌的 0.618(95%置信区间=0.581 至 0.655)到肺癌的 0.831(95%置信区间=0.817 至 0.845)。此外,我们的模型可以识别出患有癌症风险较高的个体。例如,处于最高 5%评分分位数的个体患乳腺癌的风险几乎是处于最低 10%分位数的个体的 13 倍。此外,我们观察到多基因风险评分较高的个体在早发性组中的比例较高,而高临床风险的个体在晚发性组中的比例较高。
我们的模型展示了预测癌症风险和识别高风险个体的潜力,具有广泛适用于不同癌症的能力。我们的研究结果表明,多基因风险评分模型对早发性患者的癌症风险预测能力强于对晚发性患者的预测能力,而临床风险模型对晚发性患者的预测能力强于对早发性患者的预测能力。同时,与单独使用多基因风险评分或临床风险因素相比,结合多基因风险评分和临床风险因素具有更好的整体预测性能。