Cieza Basilio, Pandey Neetesh, Ruhela Vivek, Ali Sarwan, Tosto Giuseppe
Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
The Gertrude H. Sergievsky Center, Vagelos College of Physicians and Surgeons, Columbia University. 630 West 168 Street, New York, NY 10032, USA.
bioRxiv. 2025 Aug 29:2025.08.25.672146. doi: 10.1101/2025.08.25.672146.
Genome-wide association studies (GWAS) have enabled clinicians and researchers to identify genetic variants linked to complex traits and diseases(1). However, conducting GWAS remains technically challenging without bioinformatics expertise due to required data preprocessing, software installation, and analysis scripting (2,3). SAGA is a BASH-based, open-source, fully automated pipeline that integrates three widely adopted tools-PLINK(4), GMMAT(5), and SAIGE(6)-for accessible, robust, and reproducible GWAS. After installation, users only provide standard genotype and phenotype files. The pipeline automates preprocessing, association testing, and visualization, outputting summary statistics, Manhattan plots, and quantile-quantile plot. SAGA enables robust GWAS for users with no scripting experience, expanding access to complex genetic analyses.
全基因组关联研究(GWAS)使临床医生和研究人员能够识别与复杂性状和疾病相关的基因变异(1)。然而,由于需要进行数据预处理、软件安装和分析脚本编写,在没有生物信息学专业知识的情况下进行GWAS在技术上仍然具有挑战性(2,3)。SAGA是一个基于BASH的开源全自动流程,它集成了三个广泛采用的工具——PLINK(4)、GMMAT(5)和SAIGE(6)——用于可访问、稳健且可重复的GWAS。安装后,用户只需提供标准的基因型和表型文件。该流程会自动进行预处理、关联测试和可视化,输出汇总统计信息、曼哈顿图和分位数-分位数图。SAGA使没有脚本编写经验的用户也能进行稳健的GWAS,扩大了复杂基因分析的可及性。