Department of Plant Science and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea.
Proc Natl Acad Sci U S A. 2010 Dec 21;107(51):22032-7. doi: 10.1073/pnas.1009526107. Epub 2010 Dec 3.
The genome of soybean (Glycine max), a commercially important crop, has recently been sequenced and is one of six crop species to have been sequenced. Here we report the genome sequence of G. soja, the undomesticated ancestor of G. max (in particular, G. soja var. IT182932). The 48.8-Gb Illumina Genome Analyzer (Illumina-GA) short DNA reads were aligned to the G. max reference genome and a consensus was determined for G. soja. This consensus sequence spanned 915.4 Mb, representing a coverage of 97.65% of the G. max published genome sequence and an average mapping depth of 43-fold. The nucleotide sequence of the G. soja genome, which contains 2.5 Mb of substituted bases and 406 kb of small insertions/deletions relative to G. max, is ∼0.31% different from that of G. max. In addition to the mapped 915.4-Mb consensus sequence, 32.4 Mb of large deletions and 8.3 Mb of novel sequence contigs in the G. soja genome were also detected. Nucleotide variants of G. soja versus G. max confirmed by Roche Genome Sequencer FLX sequencing showed a 99.99% concordance in single-nucleotide polymorphism and a 98.82% agreement in insertion/deletion calls on Illumina-GA reads. Data presented in this study suggest that the G. soja/G. max complex may be at least 0.27 million y old, appearing before the relatively recent event of domestication (6,000∼9,000 y ago). This suggests that soybean domestication is complicated and that more in-depth study of population genetics is needed. In any case, genome comparison of domesticated and undomesticated forms of soybean can facilitate its improvement.
大豆(Glycine max)基因组,一种商业上重要的作物,最近已经测序,是测序的六种作物之一。本文报道了大豆的基因组序列,大豆是栽培大豆(G. max)的野生祖先(特别是 G. soja var. IT182932)。Illumina Genome Analyzer(Illumina-GA)短 DNA 读取的 48.8-Gb 与 G. max 参考基因组进行了比对,并确定了 G. soja 的共识序列。这个共识序列跨越 915.4 Mb,代表了 G. max 已公布基因组序列的 97.65%的覆盖范围和平均 43 倍的映射深度。G. soja 基因组的核苷酸序列,相对于 G. max,包含 2.5 Mb 的替换碱基和 406 kb 的小插入/缺失,与 G. max 的差异约为 0.31%。除了映射的 915.4-Mb 共识序列外,G. soja 基因组中还检测到 32.4 Mb 的大片段缺失和 8.3 Mb 的新序列。通过 Roche Genome Sequencer FLX 测序证实的 G. soja 与 G. max 的核苷酸变异,在单核苷酸多态性上具有 99.99%的一致性,在 Illumina-GA 读取上的插入/缺失调用上具有 98.82%的一致性。本研究提供的数据表明,G. soja/G. max 复合体至少有 0.27 百万年的历史,可以追溯到最近的驯化事件(6000∼9000 年前)之前。这表明大豆的驯化是复杂的,需要对群体遗传学进行更深入的研究。在任何情况下,对栽培和野生大豆形式的基因组比较都可以促进其改良。