Kumpatla Siva P, Mukhopadhyay Snehasis
Indiana University School of Informatics, IUPUI, Indianapolis 46202, USA.
Genome. 2005 Dec;48(6):985-98. doi: 10.1139/g05-060.
Simple sequence repeat (SSR) markers are widely used in many plant and animal genomes due to their abundance, hypervariability, and suitability for high-throughput analysis. Development of SSR markers using molecular methods is time consuming, laborious, and expensive. Use of computational approaches to mine ever-increasing sequences such as expressed sequence tags (ESTs) in public databases permits rapid and economical discovery of SSRs. Most of such efforts to date focused on mining SSRs from monocotyledonous ESTs. In this study, we have computationally mined and examined the abundance of SSRs in more than 1.54 million ESTs belonging to 55 dicotyledonous species. The frequency of ESTs containing SSRs among species ranged from 2.65% to 16.82%. Dinucleotide repeats were found to be the most abundant followed by tri- or mono-nucleotide repeats. The motifs A/T, AG/GA/CT/TC, and AAG/AGA/GAA/CTT/TTC/TCT were the predominant mono-, di-, and tri-nucleotide SSRs, respectively. Most of the mononucleotide SSRs contained 15-25 repeats, whereas the majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. The comprehensive SSR survey data presented here demonstrates the potential of in silico mining of ESTs for rapid development of SSR markers for genetic analysis and applications in dicotyledonous crops.
简单序列重复(SSR)标记因其丰富性、高度变异性以及适用于高通量分析,而被广泛应用于许多动植物基因组中。利用分子方法开发SSR标记既耗时、费力又昂贵。使用计算方法挖掘公共数据库中不断增加的序列,如表达序列标签(EST),能够快速且经济地发现SSR。迄今为止,大多数此类工作都集中在从单子叶植物EST中挖掘SSR。在本研究中,我们通过计算挖掘并检测了属于55种双子叶植物的154万多个EST中SSR的丰度。物种中含有SSR的EST频率在2.65%至16.82%之间。发现二核苷酸重复最为丰富,其次是三核苷酸或单核苷酸重复。基序A/T、AG/GA/CT/TC和AAG/AGA/GAA/CTT/TTC/TCT分别是主要的单核苷酸、二核苷酸和三核苷酸SSR。大多数单核苷酸SSR含有15 - 25个重复,而大多数二核苷酸和三核苷酸SSR含有5 - 10个重复。本文所呈现的全面SSR调查数据证明了通过电子挖掘EST来快速开发用于双子叶作物遗传分析和应用的SSR标记的潜力。