• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

H3AGWAS:全基因组关联研究的便携式工作流程。

H3AGWAS: a portable workflow for genome wide association studies.

机构信息

Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa.

HPCBio, Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL, USA.

出版信息

BMC Bioinformatics. 2022 Nov 19;23(1):498. doi: 10.1186/s12859-022-05034-w.

DOI:10.1186/s12859-022-05034-w
PMID:36402955
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9675212/
Abstract

BACKGROUND

Genome-wide association studies (GWAS) are a powerful method to detect associations between variants and phenotypes. A GWAS requires several complex computations with large data sets, and many steps may need to be repeated with varying parameters. Manual running of these analyses can be tedious, error-prone and hard to reproduce.

RESULTS

The H3AGWAS workflow from the Pan-African Bioinformatics Network for H3Africa is a powerful, scalable and portable workflow implementing pre-association analysis, implementation of various association testing methods and post-association analysis of results.

CONCLUSIONS

The workflow is scalable-laptop to cluster to cloud (e.g., SLURM, AWS Batch, Azure). All required software is containerised and can run under Docker or Singularity.

摘要

背景

全基因组关联研究(GWAS)是一种检测变异体和表型之间关联的强大方法。GWAS 需要对大型数据集进行多次复杂的计算,并且许多步骤可能需要根据不同的参数进行重复。手动运行这些分析可能会很繁琐、容易出错且难以重现。

结果

来自 H3Africa 泛非生物信息学网络的 H3AGWAS 工作流程是一种强大、可扩展且可移植的工作流程,它实现了预关联分析、各种关联测试方法的实施以及结果的后关联分析。

结论

该工作流程具有可扩展性——从笔记本电脑到集群再到云(例如,SLURM、AWS Batch、Azure)。所有必需的软件都已容器化,可以在 Docker 或 Singularity 下运行。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c7/9675212/98598a108736/12859_2022_5034_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c7/9675212/e600758f2cd9/12859_2022_5034_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c7/9675212/98598a108736/12859_2022_5034_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c7/9675212/e600758f2cd9/12859_2022_5034_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32c7/9675212/98598a108736/12859_2022_5034_Fig2_HTML.jpg

相似文献

1
H3AGWAS: a portable workflow for genome wide association studies.H3AGWAS:全基因组关联研究的便携式工作流程。
BMC Bioinformatics. 2022 Nov 19;23(1):498. doi: 10.1186/s12859-022-05034-w.
2
BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.BIGwas:用于多队列和生物库规模 GWAS/PheWAS 数据的单命令质量控制和关联测试。
Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.
3
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.为异构计算环境开发可重现的生物信息学分析工作流程,以支持非洲基因组学。
BMC Bioinformatics. 2018 Nov 29;19(1):457. doi: 10.1186/s12859-018-2446-1.
4
yQTL Pipeline: A structured computational workflow for large scale quantitative trait loci discovery and downstream visualization.yQTL Pipeline:一种用于大规模数量性状基因座发现和下游可视化的结构化计算工作流程。
PLoS One. 2024 Jun 4;19(6):e0298501. doi: 10.1371/journal.pone.0298501. eCollection 2024.
5
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.
6
ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications.ILIAD:一套用于处理基因组数据以用于下游应用的自动化 Snakemake 工作流程套件。
BMC Bioinformatics. 2023 Nov 8;24(1):424. doi: 10.1186/s12859-023-05548-x.
7
Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.奥德赛:一个用于全基因组遗传数据相位、插补和分析的半自动流水线。
BMC Bioinformatics. 2019 Jun 28;20(1):364. doi: 10.1186/s12859-019-2964-5.
8
nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline.nf-gwas流程:一种用于全基因组关联研究的Nextflow流程。
J Open Source Softw. 2021;6(59). doi: 10.21105/joss.02957. Epub 2021 Mar 2.
9
eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity.eDNAFlow,一种利用 Nextflow 和 Singularity 的自动化、可重复和可扩展的环境 DNA 序列分析工作流程。
Mol Ecol Resour. 2021 Jul;21(5):1697-1704. doi: 10.1111/1755-0998.13356. Epub 2021 Mar 9.
10
Tibanna: software for scalable execution of portable pipelines on the cloud.Tibanna:用于在云端可扩展执行可移植管道的软件。
Bioinformatics. 2019 Nov 1;35(21):4424-4426. doi: 10.1093/bioinformatics/btz379.

引用本文的文献

1
Predicting suicidality in people living with HIV in Uganda: a machine learning approach.预测乌干达艾滋病病毒感染者的自杀倾向:一种机器学习方法。
Front Psychiatry. 2025 Aug 15;16:1584335. doi: 10.3389/fpsyt.2025.1584335. eCollection 2025.
2
Genome-wide association study identifies common variants associated with breast cancer in South African Black women.全基因组关联研究确定了与南非黑人女性乳腺癌相关的常见变异。
Nat Commun. 2025 Apr 14;16(1):3542. doi: 10.1038/s41467-025-58789-0.
3
Genome-wide association study identifying novel risk variants associated with glycaemic traits in the continental African AWI-Gen cohort.

本文引用的文献

1
nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline.nf-gwas流程:一种用于全基因组关联研究的Nextflow流程。
J Open Source Softw. 2021;6(59). doi: 10.21105/joss.02957. Epub 2021 Mar 2.
2
Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits.对撒哈拉以南非洲研究的荟萃分析提供了对脂质特征遗传结构的深入了解。
Nat Commun. 2022 May 11;13(1):2578. doi: 10.1038/s41467-022-30098-w.
3
Performing post-genome-wide association study analysis: overview, challenges and recommendations.
全基因组关联研究确定了与非洲大陆AWI-Gen队列中血糖特征相关的新风险变异。
Diabetologia. 2025 Jun;68(6):1184-1196. doi: 10.1007/s00125-025-06395-6. Epub 2025 Mar 1.
4
Assessment of the functionality and usability of open-source rare variant analysis pipelines.开源罕见变异分析流程的功能与可用性评估。
Brief Bioinform. 2025 Feb 5;26(1). doi: 10.1093/bib/bbaf044.
5
Genetic association and transferability for urinary albumin-creatinine ratio as a marker of kidney disease in four Sub-Saharan African populations and non-continental individuals of African ancestry.在四个撒哈拉以南非洲人群以及具有非洲血统的非非洲大陆个体中,尿白蛋白肌酐比值作为肾脏疾病标志物的遗传关联性及可转移性研究
Front Genet. 2024 May 15;15:1372042. doi: 10.3389/fgene.2024.1372042. eCollection 2024.
6
Performing highly parallelized and reproducible GWAS analysis on biobank-scale data.对生物样本库规模的数据进行高度并行且可重复的全基因组关联研究(GWAS)分析。
NAR Genom Bioinform. 2024 Feb 7;6(1):lqae015. doi: 10.1093/nargab/lqae015. eCollection 2024 Mar.
7
Genome-wide association study of population-standardised cognitive performance phenotypes in a rural South African community.全基因组关联研究在南非农村社区人群标准化认知表现表型中的应用。
Commun Biol. 2023 Mar 27;6(1):328. doi: 10.1038/s42003-023-04636-1.
进行全基因组关联研究分析:概述、挑战和建议。
F1000Res. 2021 Oct 4;10:1002. doi: 10.12688/f1000research.53962.1. eCollection 2021.
4
BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.BIGwas:用于多队列和生物库规模 GWAS/PheWAS 数据的单命令质量控制和关联测试。
Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.
5
Computationally efficient whole-genome regression for quantitative and binary traits.计算效率高的全基因组回归分析用于定量和二项性状。
Nat Genet. 2021 Jul;53(7):1097-1103. doi: 10.1038/s41588-021-00870-7. Epub 2021 May 20.
6
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
7
The open targets post-GWAS analysis pipeline.GWAS 后开放目标分析管道。
Bioinformatics. 2020 May 1;36(9):2936-2937. doi: 10.1093/bioinformatics/btaa020.
8
The H3ABioNet helpdesk: an online bioinformatics resource, enhancing Africa's capacity for genomics research.H3ABioNet 服务台:一个在线生物信息学资源,增强了非洲进行基因组学研究的能力。
BMC Bioinformatics. 2019 Dec 30;20(1):741. doi: 10.1186/s12859-019-3322-3.
9
A resource-efficient tool for mixed model association analysis of large-scale data.一种资源高效的工具,用于大规模数据的混合模型关联分析。
Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.
10
CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies.CAUSALdb:一个数据库,用于通过全基因组关联研究的汇总统计数据来识别疾病/特征因果变异。
Nucleic Acids Res. 2020 Jan 8;48(D1):D807-D816. doi: 10.1093/nar/gkz1026.