Song Zeyuan, Gurinovich Anastasia, Federico Anthony, Monti Stefano, Sebastiani Paola
Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02218, USA.
Section of Computational Biomedicine, Boston University School of Medicine, 72 East Concord St., Boston, MA 02218, USA.
J Open Source Softw. 2021;6(59). doi: 10.21105/joss.02957. Epub 2021 Mar 2.
A tool for conducting Genome-Wide Association Study (GWAS) in a systematic, automated and reproducible manner is overdue. We developed an automated GWAS pipeline by combining multiple analysis tools - including bcftools, vcftools, the R packages SNPRelate/GENESIS/GMMAT and ANNOVAR - through Nextflow, which is a portable, flexible, and reproducible reactive workflow framework for developing pipelines. The GWAS pipeline integrates the steps of data quality control and assessment and genetic association analyses, including analysis of cross-sectional and longitudinal studies with either single variants or gene-based tests, into a unified analysis workflow. The pipeline is implemented in Nextflow, dependencies are distributed through Docker, and the code is publicly available on Github.
目前迫切需要一种能够以系统、自动化且可重复的方式进行全基因组关联研究(GWAS)的工具。我们通过Nextflow将多个分析工具(包括bcftools、vcftools、R包SNPRelate/GENESIS/GMMAT和ANNOVAR)组合在一起,开发了一个自动化的GWAS流程,Nextflow是一个用于开发流程的便携式、灵活且可重复的反应式工作流框架。该GWAS流程将数据质量控制与评估步骤以及遗传关联分析(包括使用单变体或基于基因的测试对横断面和纵向研究进行分析)整合到一个统一的分析工作流中。该流程在Nextflow中实现,依赖项通过Docker分发,代码在Github上公开可用。