Tang Jifeng, Baldwin Samantha J, Jacobs Jeanne Me, Linden C Gerard van der, Voorrips Roeland E, Leunissen Jack Am, van Eck Herman, Vosman Ben
Laboratory of Bioinformatics, Wageningen University, PO Box 8128, 6700 ET Wageningen, the Netherlands.
BMC Bioinformatics. 2008 Sep 15;9:374. doi: 10.1186/1471-2105-9-374.
Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.
We have developed PolySSR, a new pipeline to identify polymorphic SSRs rather than just SSRs. Sequence information is obtained from public EST databases derived from heterozygous individuals and/or at least two different genotypes. The pipeline includes PCR-primer design for the putatively polymorphic SSR markers, taking into account Single Nucleotide Polymorphisms (SNPs) in the flanking regions, thereby improving the success rate of the potential markers. A large number of polymorphic SSRs were identified using publicly available EST sequences of potato, tomato, rice, Arabidopsis, Brassica and chicken.The SSRs obtained were divided into long and short based on the number of times the motif was repeated. Surprisingly, the frequency of polymorphic SSRs was much higher in the short SSRs.
PolySSR is a very effective tool to identify polymorphic SSRs. Using PolySSR, several hundred putative markers were developed and stored in a searchable database. Validation experiments showed that almost all markers that were indicated as putatively polymorphic by polySSR were indeed polymorphic. This greatly improves the efficiency of marker development, especially in species where there are low levels of polymorphism, like tomato. When combined with the new sequencing technologies PolySSR will have a big impact on the development of polymorphic SSRs in any species.PolySSR and the polymorphic SSR marker database are available from http://www.bioinformatics.nl/tools/polyssr/.
简单序列重复(SSR)或微卫星标记对于基因研究具有重要价值。开发SSR标记的实验方法费力、耗时且昂贵。在过去十年中,电子克隆方法已成为一种可行且相对廉价的替代方法,尽管测试假定的SSR标记仍然耗时且昂贵。在许多物种中,只有相对较小比例的SSR标记具有多态性。对于源自表达序列标签(EST)的标记而言尤其如此。在EST数据库中存在大量冗余序列,这些序列可能包含有关其中所含SSR长度多态性的信息,以及它们是源自杂合子还是不同基因型的信息。到目前为止,尽管已经开发了许多程序来识别EST序列中的SSR,但没有软件能够检测假定的多态性SSR。
我们开发了PolySSR,这是一种用于识别多态性SSR而非仅仅SSR的新流程。序列信息来自源自杂合个体和/或至少两种不同基因型的公共EST数据库。该流程包括针对假定的多态性SSR标记的PCR引物设计,同时考虑侧翼区域的单核苷酸多态性(SNP),从而提高潜在标记的成功率。使用马铃薯、番茄、水稻、拟南芥、油菜和鸡的公开可用EST序列鉴定出了大量多态性SSR。根据基序重复的次数,将获得的SSR分为长SSR和短SSR。令人惊讶的是,短SSR中多态性SSR的频率要高得多。
PolySSR是识别多态性SSR的非常有效的工具。使用PolySSR,开发了数百个假定标记并存储在一个可搜索的数据库中。验证实验表明,几乎所有被PolySSR指示为假定多态性的标记确实具有多态性。这大大提高了标记开发的效率,尤其是在多态性水平较低的物种中,如番茄。当与新的测序技术相结合时,PolySSR将对任何物种的多态性SSR开发产生重大影响。可从http://www.bioinformatics.nl/tools/polyssr/获取PolySSR和多态性SSR标记数据库。