Genetic Technologies Ltd., Fitzroy, Victoria, Australia.
Centre for Epidemiology and Biostatistics, The University of Melbourne, Melbourne, Victoria, Australia.
PLoS One. 2022 Dec 2;17(12):e0278764. doi: 10.1371/journal.pone.0278764. eCollection 2022.
Polygenic risk scores (PRSs) are a promising approach to accurately predict an individual's risk of developing disease. The area under the receiver operating characteristic curve (AUC) of PRSs in their population are often only reported for models that are adjusted for age and sex, which are known risk factors for the disease of interest and confound the association between the PRS and the disease. This makes comparison of PRS between studies difficult because the genetic effects cannot be disentangled from effects of age and sex (which have a high AUC without the PRS). In this study, we used data from the UK Biobank and applied the stacked clumping and thresholding method and a variation called maximum clumping and thresholding method to develop PRSs to predict coronary artery disease, hypertension, atrial fibrillation, stroke and type 2 diabetes. We created case-control training datasets in which age and sex were controlled by design. We also excluded prevalent cases to prevent biased estimation of disease risks. The maximum clumping and thresholding PRSs required many fewer single-nucleotide polymorphisms to achieve almost the same discriminatory ability as the stacked clumping and thresholding PRSs. Using the testing datasets, the AUCs for the maximum clumping and thresholding PRSs were 0.599 (95% confidence interval [CI]: 0.585, 0.613) for atrial fibrillation, 0.572 (95% CI: 0.560, 0.584) for coronary artery disease, 0.585 (95% CI: 0.564, 0.605) for type 2 diabetes, 0.559 (95% CI: 0.550, 0.569) for hypertension and 0.514 (95% CI: 0.494, 0.535) for stroke. By developing a PRS using a dataset in which age and sex are controlled by design, we have obtained true estimates of the discriminatory ability of the PRSs alone rather than estimates that include the effects of age and sex.
多基因风险评分 (PRSs) 是一种准确预测个体患病风险的有前途的方法。PRS 在其人群中的受试者工作特征曲线 (ROC) 下面积 (AUC) 通常仅报告针对年龄和性别进行调整的模型,因为年龄和性别是所关注疾病的已知风险因素,并且会混淆 PRS 与疾病之间的关联。这使得难以比较研究之间的 PRS,因为遗传效应不能与年龄和性别效应(没有 PRS 的 AUC 很高)分开。在这项研究中,我们使用了英国生物银行的数据,并应用了堆叠聚类和阈值方法以及一种称为最大聚类和阈值方法的变体来开发预测冠心病、高血压、房颤、中风和 2 型糖尿病的 PRS。我们创建了病例对照训练数据集,其中年龄和性别通过设计进行控制。我们还排除了现患病例,以防止对疾病风险的有偏估计。最大聚类和阈值 PRS 所需的单核苷酸多态性要少得多,几乎可以达到与堆叠聚类和阈值 PRS 相同的区分能力。使用测试数据集,最大聚类和阈值 PRS 的 AUC 为房颤 0.599(95%置信区间 [CI]:0.585,0.613),冠心病 0.572(95% CI:0.560,0.584),2 型糖尿病 0.585(95% CI:0.564,0.605),高血压 0.559(95% CI:0.550,0.569)和中风 0.514(95% CI:0.494,0.535)。通过使用设计中控制年龄和性别的数据集开发 PRS,我们获得了 PRS 单独区分能力的真实估计,而不是包括年龄和性别影响的估计。