Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 9RX, United Kingdom.
Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, United Kingdom.
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae551.
Polygenic scoring is an approach for estimating an individual's likelihood of a given outcome. Polygenic scores are typically calculated from genome-wide association study (GWAS) summary statistics and individual-level genotype data for the target sample. Going from genotype to interpretable polygenic scores involves many steps and there are many methods available, limiting the accessibility of polygenic scores for research and clinical application. Additional challenges exist for studies in ancestrally diverse populations. We have implemented the leading polygenic scoring methodologies within an easy-to-use pipeline called GenoPred.
Here, we present the GenoPred pipeline, an easy-to-use, high-performance, reference-standardized, and reproducible workflow for polygenic scoring. It requires minimal inputs and offers various configuration options to cater to a range of use cases. GenoPred implements a comprehensive set of analyses, including genotype and GWAS quality control, target sample ancestry inference, polygenic score file generation using a range of leading methods, and target sample scoring. GenoPred standardizes the polygenic scoring process using reference genetic data, providing interpretable polygenic scores. The pipeline is applicable to GWAS and targets data from any population within the reference, facilitating studies of diverse ancestry. GenoPred is a Snakemake pipeline with associated Conda software environments, ensuring reproducibility. We apply the pipeline to UK Biobank data demonstrating the pipeline's simplicity, efficiency, and performance. The GenoPred pipeline provides a novel resource for polygenic scoring, integrating a range of complex processes within an easy-to-use framework. GenoPred widens access to the leading polygenic scoring methodology and their application to studies of diverse ancestry.
Freely available on the web at https://github.com/opain/GenoPred.
多基因评分是一种估计个体给定结果可能性的方法。多基因评分通常是根据全基因组关联研究(GWAS)汇总统计数据和目标样本的个体水平基因型数据计算得出的。从基因型到可解释的多基因评分涉及许多步骤,并且有许多方法可用,这限制了多基因评分在研究和临床应用中的可及性。在祖先多样化的人群中进行研究还存在其他挑战。我们在一个名为 GenoPred 的易用管道中实现了领先的多基因评分方法。
在这里,我们提出了 GenoPred 管道,这是一个易于使用、高性能、参考标准化和可重复的多基因评分工作流程。它只需要最少的输入,并提供各种配置选项,以满足各种用例的需求。GenoPred 实现了一系列全面的分析,包括基因型和 GWAS 质量控制、目标样本祖先推断、使用一系列领先方法生成多基因评分文件以及目标样本评分。GenoPred 使用参考遗传数据标准化多基因评分过程,提供可解释的多基因评分。该管道适用于 GWAS 和来自参考内任何人群的目标数据,促进了多样化祖先的研究。GenoPred 是一个带有相关 Conda 软件环境的 Snakemake 管道,确保了可重复性。我们应用该管道到 UK Biobank 数据,展示了该管道的简单性、效率和性能。GenoPred 为多基因评分提供了一个新的资源,将一系列复杂的过程集成到一个易于使用的框架中。GenoPred 扩大了对领先的多基因评分方法的访问,并将其应用于多样化祖先的研究。
可在网上免费获得,网址为 https://github.com/opain/GenoPred。