Schloss J., Mitchell E., White M., Kukatla R., Bowers E., Paterson H., Kresovich S.
Department of Plant Breeding and Institute for Genomic Diversity, Cornell University, 157 Biotechnology Building, Ithaca, NY 14853, USA.
Theor Appl Genet. 2002 Nov;105(6-7):912-920. doi: 10.1007/s00122-002-0991-4. Epub 2002 Jul 30.
In this study, we collected and analyzed DNA sequence data for 789 previously mapped RFLP probes from Sorghum bicolor (L.) Moench. DNA sequences, comprising 894 non-redundant contigs and end sequences, were searched against three GenBank databases, nucleotide (nt), protein (nr) and EST (dbEST), using BLAST algorithms. Matching ESTs were also searched against nt and nr. Translated DNA sequences were then searched against the conserved domain database (CDD) to determine if functional domains/motifs were congruent with the proteins identified in previous searches. More than half (500/894 or 56%) of the query sequences had significant matches in at least one of the GenBank searches. Overall, proteins identified for 148 sequences (17%) were consistent among all searches, of which 66 sequences (7%) contained congruent coding domains. The RFLP probe sequences were also evaluated for the presence of simple sequence repeats (SSRs) and 60 SSRs were developed and assayed in an array of sorghum germplasm comprising inbreds, landraces and wild relatives. Overall, these SSR loci had lower levels of polymorphism ( D = 0.46, averaged over 51 polymorphic loci) compared with sorghum SSRs that were isolated by library hybridization screens ( D = 0.69, averaged over 38 polymorphic loci). This result was probably due to the relatively small proportion of di-nucleotide repeat-containing markers (42% of the total SSR loci) obtained from the DNA sequence data. These di-nucleotide markers also contained shorter repeat motifs than those isolated from genomic libraries. Based on BLAST results, 24 SSRs (40%) were located within, or near, previously annotated or hypothetical genes. We determined the location of 19 of these SSRs relative to putative coding regions. In general, SSRs located in coding regions were less polymorphic ( D = 0.07, averaged over three loci) than those from gene flanking regions, UTRs and introns ( D = 0.49, averaged over 16 loci). The sequence information and SSR loci generated through this study will be valuable for application to sorghum genetics and improvement, including gene discovery, marker-assisted selection, diversity and pedigree analyses, comparative mapping and evolutionary genetic studies.
在本研究中,我们收集并分析了来自双色高粱(Sorghum bicolor (L.) Moench)的789个先前定位的RFLP探针的DNA序列数据。使用BLAST算法,将包含894个非冗余重叠群和末端序列的DNA序列与三个GenBank数据库(核苷酸(nt)、蛋白质(nr)和EST(dbEST))进行比对。匹配的EST也与nt和nr进行比对。然后将翻译后的DNA序列与保守结构域数据库(CDD)进行比对,以确定功能结构域/基序是否与先前搜索中鉴定的蛋白质一致。超过一半(500/894或56%)的查询序列在至少一次GenBank搜索中具有显著匹配。总体而言,在所有搜索中,为148个序列(17%)鉴定的蛋白质是一致的,其中66个序列(7%)包含一致的编码结构域。还评估了RFLP探针序列中简单序列重复(SSR)的存在情况,并在包括自交系、地方品种和野生近缘种的一系列高粱种质中开发并检测了60个SSR。总体而言,与通过文库杂交筛选分离的高粱SSR(平均38个多态性位点的D = 0.69)相比,这些SSR位点的多态性水平较低(平均51个多态性位点的D = 0.46)。这一结果可能是由于从DNA序列数据中获得的含二核苷酸重复标记的比例相对较小(占SSR位点总数的42%)。这些二核苷酸标记的重复基序也比从基因组文库中分离的短。基于BLAST结果,24个SSR(40%)位于先前注释或假设基因之内或附近。我们确定了其中19个SSR相对于推定编码区的位置。一般来说,位于编码区的SSR比来自基因侧翼区、UTR和内含子的SSR多态性更低(平均三个位点的D = 0.07)(平均16个位点的D = 0.49)。通过本研究产生的序列信息和SSR位点对于应用于高粱遗传学和改良将是有价值的,包括基因发现、标记辅助选择、多样性和系谱分析、比较作图和进化遗传学研究。