Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana, 47907, USA.
Present address: Whitehead Institute for Biomedical Research, 455 Main Street, Cambridge, MA, 02142, USA.
BMC Plant Biol. 2022 Jul 2;22(1):315. doi: 10.1186/s12870-022-03668-9.
Genome-Wide Association Studies (GWAS) are used to identify genes and alleles that contribute to quantitative traits in large and genetically diverse populations. However, traits with complex genetic architectures create an enormous computational load for discovery of candidate genes with acceptable statistical certainty. We developed a streamlined computational pipeline for GWAS (COMPILE) to accelerate identification and annotation of candidate maize genes associated with a quantitative trait, and then matches maize genes to their closest rice and Arabidopsis homologs by sequence similarity.
COMPILE executed GWAS using a Mixed Linear Model that incorporated, without compression, recent advancements in population structure control, then linked significant Quantitative Trait Loci (QTL) to candidate genes and RNA regulatory elements contained in any genome. COMPILE was validated using published data to identify QTL associated with the traits of α-tocopherol biosynthesis and flowering time, and identified published candidate genes as well as additional genes and non-coding RNAs. We then applied COMPILE to 274 genotypes of the maize Goodman Association Panel to identify candidate loci contributing to resistance of maize stems to penetration by larvae of the European Corn Borer (Ostrinia nubilalis). Candidate genes included those that encode a gene of unknown function, WRKY and MYB-like transcriptional factors, receptor-kinase signaling, riboflavin synthesis, nucleotide-sugar interconversion, and prolyl hydroxylation. Expression of the gene of unknown function has been associated with pathogen stress in maize and in rice homologs closest in sequence identity.
The relative speed of data analysis using COMPILE allowed comparison of population size and compression. Limitations in population size and diversity are major constraints for a trait and are not overcome by increasing marker density. COMPILE is customizable and is readily adaptable for application to species with robust genomic and proteome databases.
全基因组关联研究(GWAS)用于鉴定在大型和遗传多样化的人群中导致数量性状的基因和等位基因。然而,具有复杂遗传结构的性状为发现具有可接受统计确定性的候选基因创造了巨大的计算负担。我们开发了一种用于 GWAS 的简化计算管道(COMPILE),以加速鉴定与数量性状相关的候选玉米基因,并通过序列相似性将玉米基因与它们在水稻和拟南芥中的最接近的同源基因匹配。
COMPILE 使用混合线性模型执行 GWAS,该模型包含了对群体结构控制的最新进展,而无需压缩,然后将显著的数量性状位点(QTL)与候选基因和任何基因组中包含的 RNA 调节元件相关联。使用已发表的数据验证了 COMPILE,以鉴定与α-生育酚生物合成和开花时间相关的 QTL,并鉴定了已发表的候选基因以及其他基因和非编码 RNA。然后,我们将 COMPILE 应用于玉米古德曼协会面板的 274 个基因型,以鉴定导致玉米茎对欧洲玉米螟幼虫穿透的抗性的候选位点。候选基因包括那些编码未知功能基因、WRKY 和 MYB 样转录因子、受体激酶信号、核黄素合成、核苷酸糖互变和脯氨酰羟化的基因。该未知功能基因的表达与玉米和序列同源性最接近的水稻同源物中的病原体应激有关。
使用 COMPILE 进行数据分析的相对速度允许比较群体大小和压缩。群体大小和多样性的限制是性状的主要限制因素,并且不能通过增加标记密度来克服。COMPILE 是可定制的,并且易于适应具有强大基因组和蛋白质组数据库的物种。