GenoPred 管道：一种全面且可扩展的多基因评分管道。

The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.

机构信息

Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 9RX, United Kingdom.

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, United Kingdom.

出版信息

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae551.

DOI:10.1093/bioinformatics/btae551

PMID:39292536

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11462442/

Abstract

MOTIVATION

Polygenic scoring is an approach for estimating an individual's likelihood of a given outcome. Polygenic scores are typically calculated from genome-wide association study (GWAS) summary statistics and individual-level genotype data for the target sample. Going from genotype to interpretable polygenic scores involves many steps and there are many methods available, limiting the accessibility of polygenic scores for research and clinical application. Additional challenges exist for studies in ancestrally diverse populations. We have implemented the leading polygenic scoring methodologies within an easy-to-use pipeline called GenoPred.

RESULTS

Here, we present the GenoPred pipeline, an easy-to-use, high-performance, reference-standardized, and reproducible workflow for polygenic scoring. It requires minimal inputs and offers various configuration options to cater to a range of use cases. GenoPred implements a comprehensive set of analyses, including genotype and GWAS quality control, target sample ancestry inference, polygenic score file generation using a range of leading methods, and target sample scoring. GenoPred standardizes the polygenic scoring process using reference genetic data, providing interpretable polygenic scores. The pipeline is applicable to GWAS and targets data from any population within the reference, facilitating studies of diverse ancestry. GenoPred is a Snakemake pipeline with associated Conda software environments, ensuring reproducibility. We apply the pipeline to UK Biobank data demonstrating the pipeline's simplicity, efficiency, and performance. The GenoPred pipeline provides a novel resource for polygenic scoring, integrating a range of complex processes within an easy-to-use framework. GenoPred widens access to the leading polygenic scoring methodology and their application to studies of diverse ancestry.

AVAILABILITY AND IMPLEMENTATION

Freely available on the web at https://github.com/opain/GenoPred.

摘要

动机

多基因评分是一种估计个体给定结果可能性的方法。多基因评分通常是根据全基因组关联研究（GWAS）汇总统计数据和目标样本的个体水平基因型数据计算得出的。从基因型到可解释的多基因评分涉及许多步骤，并且有许多方法可用，这限制了多基因评分在研究和临床应用中的可及性。在祖先多样化的人群中进行研究还存在其他挑战。我们在一个名为 GenoPred 的易用管道中实现了领先的多基因评分方法。

结果

在这里，我们提出了 GenoPred 管道，这是一个易于使用、高性能、参考标准化和可重复的多基因评分工作流程。它只需要最少的输入，并提供各种配置选项，以满足各种用例的需求。GenoPred 实现了一系列全面的分析，包括基因型和 GWAS 质量控制、目标样本祖先推断、使用一系列领先方法生成多基因评分文件以及目标样本评分。GenoPred 使用参考遗传数据标准化多基因评分过程，提供可解释的多基因评分。该管道适用于 GWAS 和来自参考内任何人群的目标数据，促进了多样化祖先的研究。GenoPred 是一个带有相关 Conda 软件环境的 Snakemake 管道，确保了可重复性。我们应用该管道到 UK Biobank 数据，展示了该管道的简单性、效率和性能。GenoPred 为多基因评分提供了一个新的资源，将一系列复杂的过程集成到一个易于使用的框架中。GenoPred 扩大了对领先的多基因评分方法的访问，并将其应用于多样化祖先的研究。

可用性和实现

可在网上免费获得，网址为 https://github.com/opain/GenoPred。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0442/11462442/ceb016e6d727/btae551f1.jpg

相似文献

The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.GenoPred 管道：一种全面且可扩展的多基因评分管道。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae551.

A tool for translating polygenic scores onto the absolute scale using summary statistics.用于使用汇总统计信息将多基因评分转换到绝对尺度的工具。

Eur J Hum Genet. 2022 Mar;30(3):339-348. doi: 10.1038/s41431-021-01028-z. Epub 2022 Jan 4.

Variable prediction accuracy of polygenic scores within an ancestry group.群体内多基因评分的预测准确性存在差异。

Elife. 2020 Jan 30;9:e48376. doi: 10.7554/eLife.48376.

kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS.kGWASflow：一种基于 k-mer 的 GWAS 的模块化、灵活和可重复的 Snakemake 工作流程。

G3 (Bethesda). 2023 Dec 29;14(1). doi: 10.1093/g3journal/jkad246.

Imputation Server PGS: an automated approach to calculate polygenic risk scores on imputation servers.PGS 推断服务器：一种在推断服务器上计算多基因风险评分的自动化方法。

Nucleic Acids Res. 2024 Jul 5;52(W1):W70-W77. doi: 10.1093/nar/gkae331.

Evaluation of polygenic prediction methodology within a reference-standardized framework.在参考标准化框架内评估多基因预测方法。

PLoS Genet. 2021 May 4;17(5):e1009021. doi: 10.1371/journal.pgen.1009021. eCollection 2021 May.

Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations.Admix-kit：用于混合人群遗传分析的集成工具包和管道。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae148.

Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.五项生物库中多基因评分方法的评估显示，生物库之间的差异大于方法之间的差异，并发现了集成学习的益处。

Am J Hum Genet. 2024 Jul 11;111(7):1431-1447. doi: 10.1016/j.ajhg.2024.06.003. Epub 2024 Jun 21.

Efficient Implementation of Penalized Regression for Genetic Risk Prediction.高效实现基于惩罚回归的遗传风险预测。

Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.

grenepipe: a flexible, scalable and reproducible pipeline to automate variant calling from sequence reads. grenepipe：一个灵活、可扩展且可重复的管道，用于从序列读取中自动进行变体调用。

Bioinformatics. 2022 Oct 14;38(20):4809-4811. doi: 10.1093/bioinformatics/btac600.

引用本文的文献

Common genetic variants modify disease risk and clinical presentation in monogenic diabetes.常见基因变异会改变单基因糖尿病的疾病风险和临床表现。

Nat Metab. 2025 Sep 9. doi: 10.1038/s42255-025-01372-0.

Latent class growth mixture modeling of HbA1C trajectories identifies individuals at high risk of developing complications of type 2 diabetes mellitus in the UK Biobank.在英国生物银行中，对糖化血红蛋白（HbA1C）轨迹进行潜在类别增长混合模型分析可识别出2型糖尿病并发症高风险个体。

BMJ Open Diabetes Res Care. 2025 Sep 8;13(5):e004826. doi: 10.1136/bmjdrc-2024-004826.

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes.LDAK-KVIK对定量和二元表型进行快速且强大的混合模型关联分析。

Nat Genet. 2025 Aug 11. doi: 10.1038/s41588-025-02286-z.

PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts.PGSFusion简化了生物样本库规模队列中的多基因评分构建和流行病学应用。

Genome Med. 2025 Jul 14;17(1):77. doi: 10.1186/s13073-025-01505-w.

Antidepressant Switching as a Proxy Phenotype for Drug Nonresponse: Investigating Clinical, Demographic, and Genetic Characteristics.作为药物无反应替代表型的抗抑郁药换药：临床、人口统计学和遗传特征研究

Biol Psychiatry Glob Open Sci. 2025 Apr 10;5(4):100502. doi: 10.1016/j.bpsgos.2025.100502. eCollection 2025 Jul.

Evaluating metabolome-wide causal effects on risk for psychiatric and neurodegenerative disorders.评估代谢组范围内对精神疾病和神经退行性疾病风险的因果效应。

BMC Med. 2025 Jun 2;23(1):326. doi: 10.1186/s12916-025-04129-4.

Sociodemographic, clinical, and genetic factors associated with self-reported antidepressant response outcomes in the UK Biobank.英国生物银行中与自我报告的抗抑郁药反应结果相关的社会人口统计学、临床和遗传因素。

Psychol Med. 2025 Mar 12;55:e80. doi: 10.1017/S0033291725000388.

PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development.PGSXplorer：一个用于全面质量控制和多基因评分模型开发的集成式Nextflow工作流程。

PeerJ. 2025 Feb 12;13:e18973. doi: 10.7717/peerj.18973. eCollection 2025.

Type 1 diabetes genetic risk score variation across ancestries using whole genome sequencing and array-based approaches.利用全基因组测序和基于芯片的方法分析不同血统中1型糖尿病遗传风险评分的差异。

Sci Rep. 2024 Dec 28;14(1):31044. doi: 10.1038/s41598-024-82278-x.

本文引用的文献

Am J Hum Genet. 2024 Jul 11;111(7):1431-1447. doi: 10.1016/j.ajhg.2024.06.003. Epub 2024 Jun 21.

BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability.BridgePRS 利用跨种族的共享遗传效应来提高多基因风险评分的可转移性。

Nat Genet. 2024 Jan;56(1):180-186. doi: 10.1038/s41588-023-01583-9. Epub 2023 Dec 20.

A saturated map of common genetic variants associated with human height.与人类身高相关的常见遗传变异的饱和图谱。

Nature. 2022 Oct;610(7933):704-712. doi: 10.1038/s41586-022-05275-y. Epub 2022 Oct 12.

Identifying the Common Genetic Basis of Antidepressant Response.确定抗抑郁反应的共同遗传基础。

Biol Psychiatry Glob Open Sci. 2022 Apr;2(2):115-126. doi: 10.1016/j.bpsgos.2021.07.008.

The potential of polygenic scores to improve cost and efficiency of clinical trials.多基因风险评分提高临床试验成本和效率的潜力。

Nat Commun. 2022 May 25;13(1):2922. doi: 10.1038/s41467-022-30675-z.

Improving polygenic prediction in ancestrally diverse populations.提高在祖源多样化人群中的多基因预测能力。

Nat Genet. 2022 May;54(5):573-580. doi: 10.1038/s41588-022-01054-7. Epub 2022 May 5.

A tool for translating polygenic scores onto the absolute scale using summary statistics.用于使用汇总统计信息将多基因评分转换到绝对尺度的工具。

Eur J Hum Genet. 2022 Mar;30(3):339-348. doi: 10.1038/s41431-021-01028-z. Epub 2022 Jan 4.

Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology.常见和罕见变异关联分析在肌萎缩侧索硬化症中确定了 15 个具有不同遗传结构和神经元特异性生物学的风险位点。

Nat Genet. 2021 Dec;53(12):1636-1648. doi: 10.1038/s41588-021-00973-1. Epub 2021 Dec 6.

Improved genetic prediction of complex traits from individual-level data or summary statistics.从个体水平数据或汇总统计信息中提高复杂性状的遗传预测能力。

Nat Commun. 2021 Jul 7;12(1):4192. doi: 10.1038/s41467-021-24485-y.

Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。

F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GenoPred 管道：一种全面且可扩展的多基因评分管道。

The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献