Department of Biological Sciences, University of South Carolina, Columbia, South Carolina 29208.
G3 (Bethesda). 2013 Nov 6;3(11):2085-94. doi: 10.1534/g3.113.008417.
Genome-wide association studies are a powerful method to dissect the genetic basis of traits, although in practice the effects of complex genetic architecture and population structure remain poorly understood. To compare mapping strategies we dissected the genetic control of flavonoid pigmentation traits in the cereal grass sorghum by using high-resolution genotyping-by-sequencing single-nucleotide polymorphism markers. Studying the grain tannin trait, we find that general linear models (GLMs) are not able to precisely map tan1-a, a known loss-of-function allele of the Tannin1 gene, with either a small panel (n = 142) or large association panel (n = 336), and that indirect associations limit the mapping of the Tannin1 locus to Mb-resolution. A GLM that accounts for population structure (Q) or standard mixed linear model that accounts for kinship (K) can identify tan1-a, whereas a compressed mixed linear model performs worse than the naive GLM. Interestingly, a simple loss-of-function genome scan, for genotype-phenotype covariation only in the putative loss-of-function allele, is able to precisely identify the Tannin1 gene without considering relatedness. We also find that the tan1-a allele can be mapped with gene resolution in a biparental recombinant inbred line family (n = 263) using genotyping-by-sequencing markers but lower precision in the mapping of vegetative pigmentation traits suggest that consistent gene-level resolution will likely require larger families or multiple recombinant inbred lines. These findings highlight that complex association signals can emerge from even the simplest traits given epistasis and structured alleles, but that gene-resolution mapping of these traits is possible with high marker density and appropriate models.
全基因组关联研究是剖析性状遗传基础的有力方法,但在实践中,复杂的遗传结构和群体结构的影响仍未得到很好的理解。为了比较作图策略,我们使用高分辨率的测序单核苷酸多态性标记对谷物高粱的类黄酮色素性状的遗传控制进行了剖析。在研究谷物单宁性状时,我们发现,一般线性模型(GLMs)无法精确地定位 tan1-a,这是 Tannin1 基因的一个已知功能丧失等位基因,无论是使用小面板(n = 142)还是大关联面板(n = 336),间接关联都限制了 Tannin1 基因座的图谱分辨率达到 Mb 水平。考虑到群体结构的 GLM(Q)或考虑亲缘关系的标准混合线性模型(K)可以识别 tan1-a,而压缩混合线性模型的表现不如简单的 GLM。有趣的是,仅对假定功能丧失等位基因的基因型-表型共变进行简单的功能丧失全基因组扫描,无需考虑亲缘关系,就能够精确地识别 Tannin1 基因。我们还发现,在二倍体重组自交系家系(n = 263)中,使用测序单核苷酸多态性标记可以以基因分辨率定位 tan1-a 等位基因,但在定位营养色素性状时的分辨率较低,这表明要实现一致的基因水平分辨率,可能需要更大的家系或多个重组自交系。这些发现强调了,即使是最简单的性状,由于上位性和结构化等位基因,也可能出现复杂的关联信号,但通过高密度标记和适当的模型,这些性状的基因分辨率作图是可行的。