复杂结构多样化人群中基因定位的全基因组关联研究程序

GWAS Procedures for Gene Mapping in Diverse Populations With Complex Structures.

作者信息

Zuo Zhen, Li Mingliang, Liu Defu, Li Qi, Huang Bin, Ye Guanshi, Wang Jiabo, Tang You, Zhang Zhiwu

机构信息

Electrical and Information Engineering College, Jilin Agricultural Science and Technology University, Jilin, Jilin, China.

Information Technology Academy, Jilin Agricultural University, Changchun, Jilin, China.

出版信息

Bio Protoc. 2025 Apr 20;15(8):e5284. doi: 10.21769/BioProtoc.5284.

DOI:10.21769/BioProtoc.5284

PMID:40291431

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12021685/

Abstract

With reduced genotyping costs, genome-wide association studies (GWAS) face more challenges in diverse populations with complex structures to map genes of interest. The complex structure demands sophisticated statistical models, and increased marker density and population size require efficient computing tools. Many statistical models and computing tools have been developed with varied properties in statistical power, computing efficiency, and user-friendly accessibility. Some statistical models were developed with dedicated computing tools, such as efficient mixed model analysis (EMMA), multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). However, there are computing tools (e.g., GAPIT) that implement multiple statistical models, retain a constant user interface, and maintain enhancement on input data and result interpretation. In this study, we developed a protocol utilizing a minimal set of software tools (BEAGLE, BLINK, and GAPIT) to perform a variety of analyses including file format conversion, missing genotype imputation, GWAS, and interpretation of input data and outcome results. We demonstrated the protocol by reanalyzing data from the Rice 3000 Genomes Project and highlighting advancements in GWAS model development.

摘要

随着基因分型成本的降低，全基因组关联研究（GWAS）在具有复杂结构的不同人群中绘制感兴趣基因时面临更多挑战。复杂的结构需要复杂的统计模型，而增加的标记密度和群体规模需要高效的计算工具。已经开发了许多统计模型和计算工具，它们在统计功效、计算效率和用户友好性方面具有不同的特性。一些统计模型是与专用计算工具一起开发的，例如高效混合模型分析（EMMA）、多位点混合模型（MLMM）、固定和随机模型循环概率统一（FarmCPU）以及贝叶斯信息和连锁不平衡迭代嵌套键路（BLINK）。然而，有一些计算工具（例如GAPIT）可以实现多种统计模型，保持统一的用户界面，并在输入数据和结果解释方面不断改进。在本研究中，我们开发了一种协议，利用最少的一组软件工具（BEAGLE、BLINK和GAPIT）来执行各种分析，包括文件格式转换、缺失基因型插补、GWAS以及输入数据和结果的解释。我们通过重新分析水稻3000基因组计划的数据并突出GWAS模型开发的进展来展示该协议。