Scripps Research Translational Institute, Scripps Research, La Jolla, CA, 92037, USA.
Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA, 92037, USA.
Genome Med. 2020 Nov 23;12(1):100. doi: 10.1186/s13073-020-00801-x.
Polygenic risk scores (PRSs) are a summarization of an individual's genetic risk for a disease or trait. These scores are being generated in research and commercial settings to study how they may be used to guide healthcare decisions. PRSs should be updated as genetic knowledgebases improve; however, no guidelines exist for their generation or updating.
Here, we characterize the variability introduced in PRS calculation by a common computational process used in their generation-genotype imputation. We evaluated PRS variability when performing genotype imputation using 3 different pre-phasing tools (Beagle, Eagle, SHAPEIT) and 2 different imputation tools (Beagle, Minimac4), relative to a WGS-based gold standard. Fourteen different PRSs spanning different disease architectures and PRS generation approaches were evaluated.
We find that genotype imputation can introduce variability in calculated PRSs at the individual level without any change to the underlying genetic model. The degree of variability introduced by genotype imputation differs across algorithms, where pre-phasing algorithms with stochastic elements introduce the greatest degree of score variability. In most cases, PRS variability due to imputation is minor (< 5 percentile rank change) and does not influence the interpretation of the score. PRS percentile fluctuations are also reduced in the more informative tails of the PRS distribution. However, in rare instances, PRS instability at the individual level can result in singular PRS calculations that differ substantially from a whole genome sequence-based gold standard score.
Our study highlights some challenges in applying population genetics tools to individual-level genetic analysis including return of results. Rare individual-level variability events are masked by a high degree of overall score reproducibility at the population level. In order to avoid PRS result fluctuations during updates, we suggest that deterministic imputation processes or the average of multiple iterations of stochastic imputation processes be used to generate and deliver PRS results.
多基因风险评分(PRSs)是个体疾病或特征遗传风险的综合。这些评分正在研究和商业环境中生成,以研究它们如何用于指导医疗保健决策。随着遗传知识库的改进,PRS 应该进行更新;然而,目前还没有关于它们的生成或更新的指南。
在这里,我们描述了在生成 PRS 时常用的计算过程中引入的计算变异。我们使用 3 种不同的预相位工具(Beagle、Eagle、SHAPEIT)和 2 种不同的 imputation 工具(Beagle、Minimac4),相对于基于 WGS 的金标准,评估了在进行基因型 imputation 时 PRS 变异性。评估了跨越不同疾病结构和 PRS 生成方法的 14 种不同的 PRS。
我们发现,基因型 imputation 可以在不改变潜在遗传模型的情况下,在个体水平上引入计算 PRS 的变异性。基因型 imputation 引入的变异程度因算法而异,其中具有随机元素的预相位算法引入的评分变异程度最大。在大多数情况下,由于 imputation 引起的 PRS 变异较小(<5%分位秩变化),不会影响对评分的解释。在 PRS 分布的更有信息量的尾部,PRS 百分位波动也会减少。然而,在极少数情况下,个体水平的 PRS 不稳定性会导致个体 PRS 计算与基于全基因组序列的金标准评分有很大差异。
我们的研究强调了将群体遗传学工具应用于个体水平遗传分析的一些挑战,包括结果的回报。罕见的个体水平变异事件被高程度的总体评分重现性掩盖。为了避免在更新过程中 PRS 结果波动,我们建议使用确定性 imputation 过程或多次随机 imputation 过程的平均值来生成和提供 PRS 结果。