Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac039.
Polygenic scores (PGS) are important tools for carrying out genetic prediction of common diseases and disease related complex traits, facilitating the development of precision medicine. Unfortunately, despite the critical importance of PGS and the vast number of PGS methods recently developed, few comprehensive comparison studies have been performed to evaluate the effectiveness of PGS methods. To fill this critical knowledge gap, we performed a comprehensive comparison study on 12 different PGS methods through internal evaluations on 25 quantitative and 25 binary traits within the UK Biobank with sample sizes ranging from 147 408 to 336 573, and through external evaluations via 25 cross-study and 112 cross-ancestry analyses on summary statistics from multiple genome-wide association studies with sample sizes ranging from 1415 to 329 345. We evaluate the prediction accuracy, computational scalability, as well as robustness and transferability of different PGS methods across datasets and/or genetic ancestries, providing important guidelines for practitioners in choosing PGS methods. Besides method comparison, we present a simple aggregation strategy that combines multiple PGS from different methods to take advantage of their distinct benefits to achieve stable and superior prediction performance. To facilitate future applications of PGS, we also develop a PGS webserver (http://www.pgs-server.com/) that allows users to upload summary statistics and choose different PGS methods to fit the data directly. We hope that our results, method and webserver will facilitate the routine application of PGS across different research areas.
多基因评分(PGS)是进行常见疾病和疾病相关复杂性状遗传预测的重要工具,有助于精准医学的发展。不幸的是,尽管 PGS 至关重要,而且最近已经开发了大量的 PGS 方法,但很少有全面的比较研究来评估 PGS 方法的有效性。为了填补这一关键知识空白,我们通过在 UK Biobank 内对 25 个定量性状和 25 个二分类性状进行内部评估,以及通过对来自多个全基因组关联研究的汇总统计数据进行 25 项跨研究和 112 项跨血统分析进行外部评估,对 12 种不同的 PGS 方法进行了全面比较研究。我们评估了不同 PGS 方法在数据集和/或遗传血统之间的预测准确性、计算可扩展性,以及稳健性和可转移性,为实践者选择 PGS 方法提供了重要指导。除了方法比较,我们还提出了一种简单的聚合策略,该策略结合了来自不同方法的多个 PGS,以利用它们各自的优势来实现稳定和优越的预测性能。为了促进未来 PGS 的应用,我们还开发了一个 PGS 网络服务器(http://www.pgs-server.com/),允许用户上传汇总统计数据并选择不同的 PGS 方法直接适应该数据。我们希望我们的结果、方法和网络服务器将促进 PGS 在不同研究领域的常规应用。