Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
Hum Genet. 2024 Dec;143(12):1401-1431. doi: 10.1007/s00439-024-02716-8. Epub 2024 Nov 14.
This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.
这篇范围界定综述旨在从 2013 年至 2023 年期间,确定和评估基于多基因风险评分(PRS)的基因组预测方法,并突出其在方法学上的进展、核心概念和知识、研究和技术方面现存的差距。在过去的十年中,已经出现了各种基于 PRS 的方法,每种方法都采用了不同的统计框架,旨在提高预测准确性、处理速度和内存效率。尽管取得了显著的进展,但仍然存在一些挑战,包括在样本量和特征的多基因性方面存在不切实际的假设,以及在探索超参数空间和考虑环境交互作用方面存在的局限性。我们纳入了使用有效方法进行方法学评估并发布计算工具/软件的基于 PRS 的风险预测方法的研究。此外,我们还限制了研究范围,仅纳入了涉及人类参与者的英文发表研究。本综述遵循了 Joanna Briggs 研究所评论员手册推荐的标准方案,系统地在 Ovid MEDLINE、Ovid Embase、Scopus 和 Web of Science 数据库中进行了检索。此外,还搜索了预印本服务器如 bioRxiv 等灰色文献来源和专家推荐的文章,以确保全面和多样化地涵盖相关记录。这项研究确定了 34 项详细描述 37 种基因组预测方法的研究,其中大多数方法都依赖于连锁不平衡(LD)信息,并且需要超参数调整。有 9 种方法整合了功能/基因注释,有 12 种方法适用于跨种族基因组预测,只有 1 种方法考虑了基因-环境(GxE)相互作用。虽然有些方法需要个体水平的数据,但大多数方法都利用汇总统计数据,提供了一定的灵活性。尽管取得了进展,但仍然存在一些挑战。这些挑战包括计算复杂性和实现高精度预测所需的大量样本量。此外,最近的方法在不同的特征中表现出不同的有效性,绝对准确性往往达不到临床应用的要求。跨种族的可转移性因特征遗传力和训练数据的多样性而异,而处理混合人群仍然具有挑战性。此外,个体 PRS 缺乏标准误差测量值,这在临床环境中至关重要,突显了一个关键差距。另一个问题是当前软件包中缺乏可定制的图形可视化工具。虽然基因组预测方法已经取得了显著的进展,但仍有改进的空间。解决当前的挑战并采用未来的研究方向,将有助于开发更普遍适用、稳健和具有临床相关性的基因组预测工具。