Gur-Arie R, Cohen C J, Eitan Y, Shelef L, Hallerman E M, Kashi Y
Department of Food Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel.
Genome Res. 2000 Jan;10(1):62-71.
Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.
基于计算机对大肠杆菌K12菌株DNA序列进行全基因组筛选,发现了数以万计的串联简单序列重复(SSR)片段,其基序长度从1到6个核苷酸不等。SSR在整个基因组中分布良好。单核苷酸SSR在非编码区中占比过高,而在开放阅读框(ORF)中占比过低。无论是在ORF还是在非编码区,单核苷酸和二核苷酸SSR的核苷酸组成都与其所在的基因组区域不同,所有单核苷酸SSR中有93%为A或T。基于计算机对基因组非编码部分中每个SSR位点相对于下游ORF的精细位置进行分析,结果显示SSR位于可能影响基因调控的区域。在大肠杆菌菌株之间比较了14个随机选择的SSR片段的DNA序列。在筛选的7个单核苷酸SSR片段中的4个观察到了SSR拷贝数多态性,所有多态性均发生在非编码区。SSR多态性作为全基因组变异来源,无论是在实际应用(包括快速检测、菌株鉴定以及影响关键表型的位点检测)还是在微生物的进化适应方面,都可能具有重要意义。