Torres Isaac, Zhang Shufan, Bouffier Amanda, Schüttler Bernd, Arnold Jonathan
Institute of Bioinformatics, University of Georgia, Athens, GA 30602, United States.
G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf163.
The computational methodology of Genome Wide Association Studies (GWAS) currently has several limitations: (i) the number of observations (rows) on a quantitative trait tends to be smaller than the number of single nucleotide polymorphisms (SNPs) (columns) in the design matrix; (ii) each SNP is usually modeled separately, failing to acknowledge interaction between each other (ie epistasis); (iii) there is implicit linkage disequilibrium (LD) between neighboring SNPs due to their linkage. To overcome these issues, we developed a tool that uses ensemble methods to fit mixed linear models to GWAS data, and these ensemble methods include the development of a new experimental design approach in GWAS, which uses the resultant models and data to select the next informative experiment over time. This new adaptive and staged approach for GWAS experimental design was developed and tested in a 3 yr adaptive model-guided discovery experiment against a fixed classical design. In Sorghum bicolor a total of 79, 86, and 78 accessions were tested in years 1, 2, and 3, respectively out of 343 accessions available in the Bioenergy Association Panel (BAP) each identified for 232,303 SNPs, 1 every 2-3 kb in the genomes. We demonstrated the feasibility of MINE enacted with 8 people in the field per year over 3 yr vs in 1 large classical design enacted with 20 people in 1 yr. The MINE results for chromosomal regions identified controlling dry weight were confirmed against results from previous sorghum GWAS experiments and 1 large classical design for the BAP panel.
全基因组关联研究(GWAS)的计算方法目前存在几个局限性:(i)数量性状的观测值(行数)往往比设计矩阵中的单核苷酸多态性(SNP)数量(列数)少;(ii)每个SNP通常单独建模,没有考虑彼此之间的相互作用(即上位性);(iii)由于相邻SNP之间存在连锁,它们之间存在隐性连锁不平衡(LD)。为了克服这些问题,我们开发了一种工具,该工具使用集成方法对GWAS数据拟合混合线性模型,这些集成方法包括在GWAS中开发一种新的实验设计方法,该方法使用所得模型和数据随时间选择下一个信息丰富的实验。这种用于GWAS实验设计的新的自适应和分阶段方法是在一个3年的自适应模型引导发现实验中针对固定的经典设计开发和测试的。在双色高粱中,在生物能源协会小组(BAP)提供的343份材料中,分别在第1年、第2年和第3年测试了79份、86份和78份材料,每份材料鉴定出232,303个SNP,基因组中每2-3 kb有1个SNP。我们证明了每年在田间安排8人进行3年的多信息嵌套实验(MINE)与在1年内安排20人进行1次大型经典设计实验相比的可行性。针对先前高粱GWAS实验的结果以及BAP小组的1次大型经典设计,对MINE鉴定出的控制干重的染色体区域的结果进行了验证。