Henches Léo, Kim Jihye, Yang Zhiyu, Rubinacci Simone, Pires Gabriel, Albiñana Clara, Boetto Christophe, Julienne Hanna, Frouin Arthur, Auvergne Antoine, Suzuki Yuka, Djebali Sarah, Delaneau Olivier, Ganna Andrea, Vilhjálmsson Bjarni, Privé Florian, Aschard Hugues
Institut Pasteur, Université de Paris, Department of Computational Biology, 75015 Paris, France.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
HGG Adv. 2025 May 14;6(3):100457. doi: 10.1016/j.xhgg.2025.100457.
Polygenic risk scores (PRSs) models trained from genome-wide association study (GWAS) results are set to play a pivotal role in biomedical research addressing multifactorial human diseases. The prospect of using these risk scores in clinical care and public health is generating both enthusiasm and controversy, with varying opinions among experts about their strengths and limitations. The performance of existing polygenic scores is still limited but is expected to improve with increasing GWAS sample sizes and the development of new, more powerful methods. Theoretically, the variance explained by PRS can be as high as the total additive genetic variance, but it is unclear how much of that variance has already been captured by PRS. Here, we conducted a retrospective analysis to assess progress in PRS prediction accuracy since the publication of the first large-scale GWASs, using data from six common human diseases with sufficient GWAS information. We show that although PRS accuracy has grown rapidly over the years, the pace of improvement from recent GWAS has decreased substantially, suggesting that merely increasing GWAS sample sizes may lead to only modest improvements in risk discrimination. We next investigated the factors influencing the maximum achievable prediction using whole-genome sequencing data from 125,000 UK Biobank participants and state-of-the-art modeling of polygenic outcomes. Our analyses suggest that increasing the variant coverage of PRS, using either more imputed variants or sequencing data, is a key component for future improvements in prediction accuracy.
从全基因组关联研究(GWAS)结果训练而来的多基因风险评分(PRSs)模型,注定要在针对多因素人类疾病的生物医学研究中发挥关键作用。在临床护理和公共卫生中使用这些风险评分的前景,既引发了热情,也带来了争议,专家们对其优势和局限性看法不一。现有多基因评分的性能仍然有限,但随着GWAS样本量的增加以及新的、更强大方法的开发,有望得到改善。理论上,PRS解释的方差可以高达总加性遗传方差,但尚不清楚PRS已经捕获了其中多少方差。在此,我们进行了一项回顾性分析,利用来自六种具有足够GWAS信息的常见人类疾病的数据,评估自首批大规模GWAS发表以来PRS预测准确性的进展。我们表明,尽管多年来PRS准确性迅速提高,但近期GWAS的改善速度已大幅下降,这表明仅仅增加GWAS样本量可能只会带来风险辨别方面的适度改善。接下来,我们利用来自125,000名英国生物银行参与者的全基因组测序数据和多基因结果的先进建模,研究了影响最大可实现预测的因素。我们的分析表明,使用更多的推算变异或测序数据来增加PRS的变异覆盖率,是未来提高预测准确性的关键因素。