Shultz Jeffry L, Ray Jeffery D, Lightfoot David A
USDA-ARS, Crop Genetics and Production Research Unit, Stoneville, MS 38776, USA.
BMC Genomics. 2007 Jan 8;8:8. doi: 10.1186/1471-2164-8-8.
Soybean (Glycine max, L. Merr.) is one of the world's most important crops, however, its complete genomic sequence has yet to be determined. Nonetheless, a large body of sequence information exists, particularly in the form of expressed sequence tags (ESTs). Herein, we report the use of the model organism Arabidopsis thaliana (thale cress) for which the entire genomic sequence is available as a framework to align thousands of short soybean sequences.
A series of JAVA-based programs were created that processed and compared 341,619 soybean DNA sequences against A. thaliana chromosomal DNA. A. thaliana DNA was probed for short, exact matches (15 bp) to each soybean sequence, and then checked for the number of additional 7 bp matches in the adjacent 400 bp region. The position of these matches was used to order soybean sequences in relation to the A. thaliana genome.
Reported associations between soybean sequences and A. thaliana were within a 95% confidence interval of e(-30)-e(-100). In addition, the clustering of soybean expressed sequence tags (ESTs) based on A. thaliana sequence was accurate enough to identify potential single nucleotide polymorphisms (SNPs) within the soybean sequence clusters. An EST, bacterial artificial chromosome (BAC) end sequence and marker amplicon sequence synteny map of soybean and A. thaliana is presented. In addition, all JAVA programs used to create this map are available upon request and on the WEB.
大豆(Glycine max, L. Merr.)是世界上最重要的作物之一,然而其完整的基因组序列尚未确定。尽管如此,大量的序列信息已经存在,特别是以表达序列标签(ESTs)的形式。在此,我们报告利用模式生物拟南芥(Arabidopsis thaliana),其完整的基因组序列已知,作为一个框架来比对数千条短的大豆序列。
创建了一系列基于JAVA的程序,用于处理和比较341,619条大豆DNA序列与拟南芥染色体DNA。在拟南芥DNA中搜索与每条大豆序列的短的、精确匹配(15bp),然后检查相邻400bp区域中额外7bp匹配的数量。这些匹配的位置用于按照拟南芥基因组来排列大豆序列。
报告的大豆序列与拟南芥之间的关联处于e(-30)-e(-100)的95%置信区间内。此外,基于拟南芥序列对大豆表达序列标签(ESTs)进行聚类,其准确性足以识别大豆序列簇内的潜在单核苷酸多态性(SNPs)。展示了大豆和拟南芥的EST、细菌人工染色体(BAC)末端序列以及标记扩增子序列的共线性图谱。此外,所有用于创建此图谱的JAVA程序可应要求提供并在网络上获取。