• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BIGwas:用于多队列和生物库规模 GWAS/PheWAS 数据的单命令质量控制和关联测试。

BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.

机构信息

Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Rosalind-Franklin-Str. 12, 24105 Kiel, Germany.

Haematology Lab Kiel, Klinik für Innere Medizin II, University Hospital Schleswig-Holstein, Langer Segen 8-10, 24105 Kiel, Germany.

出版信息

Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.

DOI:10.1093/gigascience/giab047
PMID:34184051
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8239664/
Abstract

BACKGROUND

Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples.

RESULTS

Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs.

CONCLUSIONS

Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.

摘要

背景

涉及数十个基于人群的生物库的 100 万项 GWAS 样本的全基因组关联研究(GWAS)和表型全基因组关联研究(PheWAS)带来了相当大的计算挑战,并且由大型科学团队在大量时间和人员的投入下进行。自动化这些流程需要高效且可扩展的方法和软件,但到目前为止,还没有一种工作流程解决方案可以轻松处理 100 万项 GWAS 样本。

结果

在这里,我们介绍了 BIGwas,这是一种用于大规模二分类和定量性状 GWAS 数据的便携式、全自动质量控制和关联测试管道,这些数据由生物库资源提供。通过使用 Nextflow 工作流和 Singularity 软件容器技术,BIGwas 只需 1 个命令即可在本地计算机或任何高性能计算(HPC)系统上高效且可重复地进行分析,无需手动安装软件执行环境或各种软件包。对于具有 974818 个人和 9200 万个遗传标记的单命令 GWAS 分析,BIGwas 在具有仅 7 个计算节点的小型 HPC 系统上进行大约 16 天的完整 GWAS QC 和关联分析协议。我们的动态并行化方法使大型 HPC 的运行时间更短。

结论

没有广泛的生物信息学知识和很少计算机资源的研究人员可以使用 BIGwas 对 100 万项 GWAS 样本进行多队列 GWAS,如果需要,还可以使用它来构建自己的(全基因组)PheWAS 资源。BIGwas 可从 http://github.com/ikmb/gwas-qc 和 http://github.com/ikmb/gwas-assoc 免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/9775b143d614/giab047fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/0eab2e7e04b5/giab047fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/7f340c7ef600/giab047fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/9775b143d614/giab047fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/0eab2e7e04b5/giab047fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/7f340c7ef600/giab047fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4425/8239664/9775b143d614/giab047fig3.jpg

相似文献

1
BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data.BIGwas:用于多队列和生物库规模 GWAS/PheWAS 数据的单命令质量控制和关联测试。
Gigascience. 2021 Jun 29;10(6). doi: 10.1093/gigascience/giab047.
2
Phenome-wide association studies on cardiovascular health and fatty acids considering phenotype quality control practices for epidemiological data.考虑到流行病学数据的表型质量控制实践,对心血管健康和脂肪酸进行全表型关联研究。
Pac Symp Biocomput. 2020;25:659-670.
3
H3AGWAS: a portable workflow for genome wide association studies.H3AGWAS:全基因组关联研究的便携式工作流程。
BMC Bioinformatics. 2022 Nov 19;23(1):498. doi: 10.1186/s12859-022-05034-w.
4
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状,以检测新的性状-基因关联。
Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
5
Molgenis-impute: imputation pipeline in a box.Molgenis-impute:一体化的插补流程。
BMC Res Notes. 2015 Aug 19;8:359. doi: 10.1186/s13104-015-1309-3.
6
kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS.kGWASflow:一种基于 k-mer 的 GWAS 的模块化、灵活和可重复的 Snakemake 工作流程。
G3 (Bethesda). 2023 Dec 29;14(1). doi: 10.1093/g3journal/jkad246.
7
Canary: an automated tool for the conversion of MaCH imputed dosage files to PLINK files.Canary:一个用于将 MaCH 导入的剂量文件转换为 PLINK 文件的自动化工具。
BMC Bioinformatics. 2022 Jul 27;23(1):304. doi: 10.1186/s12859-022-04822-8.
8
Performing highly parallelized and reproducible GWAS analysis on biobank-scale data.对生物样本库规模的数据进行高度并行且可重复的全基因组关联研究(GWAS)分析。
NAR Genom Bioinform. 2024 Feb 7;6(1):lqae015. doi: 10.1093/nargab/lqae015. eCollection 2024 Mar.
9
Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.奥德赛:一个用于全基因组遗传数据相位、插补和分析的半自动流水线。
BMC Bioinformatics. 2019 Jun 28;20(1):364. doi: 10.1186/s12859-019-2964-5.
10
RICOPILI: Rapid Imputation for COnsortias PIpeLIne.RICOPILI:Consortium Pipeline 的快速推断。
Bioinformatics. 2020 Feb 1;36(3):930-933. doi: 10.1093/bioinformatics/btz633.

引用本文的文献

1
Assessment of the functionality and usability of open-source rare variant analysis pipelines.开源罕见变异分析流程的功能与可用性评估。
Brief Bioinform. 2025 Feb 5;26(1). doi: 10.1093/bib/bbaf044.
2
Associations of ACE I/D and AGTR1 rs5182 polymorphisms with diabetes and their effects on lipids in an elderly Chinese population.ACE I/D 和 AGTR1 rs5182 多态性与老年中国人群糖尿病的关联及其对血脂的影响。
Lipids Health Dis. 2024 Jul 30;23(1):231. doi: 10.1186/s12944-024-02222-w.
3
Performing highly parallelized and reproducible GWAS analysis on biobank-scale data.

本文引用的文献

1
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
2
Genomewide Association Study of Severe Covid-19 with Respiratory Failure.全基因组关联研究严重新冠肺炎伴呼吸衰竭。
N Engl J Med. 2020 Oct 15;383(16):1522-1534. doi: 10.1056/NEJMoa2020283. Epub 2020 Jun 17.
3
Exploring and visualizing large-scale genetic associations by using PheWeb.使用PheWeb探索和可视化大规模基因关联。
对生物样本库规模的数据进行高度并行且可重复的全基因组关联研究(GWAS)分析。
NAR Genom Bioinform. 2024 Feb 7;6(1):lqae015. doi: 10.1093/nargab/lqae015. eCollection 2024 Mar.
4
The combination of autism and exceptional cognitive ability is associated with suicidal ideation.自闭症与非凡认知能力相结合与自杀意念有关。
Neurobiol Learn Mem. 2023 Jan;197:107698. doi: 10.1016/j.nlm.2022.107698. Epub 2022 Nov 28.
5
H3AGWAS: a portable workflow for genome wide association studies.H3AGWAS:全基因组关联研究的便携式工作流程。
BMC Bioinformatics. 2022 Nov 19;23(1):498. doi: 10.1186/s12859-022-05034-w.
6
Clinical autism subscales have common genetic liabilities that are heritable, pleiotropic, and generalizable to the general population.临床自闭症子量表具有共同的遗传易感性,这些易感性是可遗传的、多效的,并可推广到一般人群。
Transl Psychiatry. 2022 Jun 13;12(1):247. doi: 10.1038/s41398-022-01982-2.
Nat Genet. 2020 Jun;52(6):550-552. doi: 10.1038/s41588-020-0622-5.
4
Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts.基于区域的大型生物库和队列关联检验的可扩展广义线性混合模型。
Nat Genet. 2020 Jun;52(6):634-639. doi: 10.1038/s41588-020-0621-6. Epub 2020 May 18.
5
RICOPILI: Rapid Imputation for COnsortias PIpeLIne.RICOPILI:Consortium Pipeline 的快速推断。
Bioinformatics. 2020 Feb 1;36(3):930-933. doi: 10.1093/bioinformatics/btz633.
6
Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data.奥德赛:一个用于全基因组遗传数据相位、插补和分析的半自动流水线。
BMC Bioinformatics. 2019 Jun 28;20(1):364. doi: 10.1186/s12859-019-2964-5.
7
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.为异构计算环境开发可重现的生物信息学分析工作流程,以支持非洲基因组学。
BMC Bioinformatics. 2018 Nov 29;19(1):457. doi: 10.1186/s12859-018-2446-1.
8
The UCSC Genome Browser database: 2019 update.UCSC 基因组浏览器数据库:2019 年更新。
Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858. doi: 10.1093/nar/gky1095.
9
Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program.《百万退伍军人计划中约 30 万多民族参与者的血脂遗传学》。
Nat Genet. 2018 Nov;50(11):1514-1523. doi: 10.1038/s41588-018-0222-9. Epub 2018 Oct 1.
10
Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits.对超过 100 万人的基因分析确定了 535 个与血压特征相关的新基因座。
Nat Genet. 2018 Oct;50(10):1412-1425. doi: 10.1038/s41588-018-0205-x. Epub 2018 Sep 17.