Xia En-Hua, Yao Qiu-Yang, Zhang Hai-Bin, Jiang Jian-Jun, Zhang Li-Ping, Gao Li-Zhi
Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of SciencesKunming, China; University of Chinese Academy of SciencesBeijing, China.
Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences Kunming, China.
Front Plant Sci. 2016 Jan 7;6:1171. doi: 10.3389/fpls.2015.01171. eCollection 2015.
Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html.
简单序列重复(SSRs),也被称为微卫星,是普遍存在的短串联重复序列,常见于各种生物的基因组和/或转录组中。由于其高突变率和中性进化,它们是遗传分析和育种计划中最强大的分子标记之一。然而,传统上对SSR多态性状态的实验筛选及其随后在遗传研究中的适用性极其耗费人力和时间。幸运的是,最近下一代测序成本的降低以及大型基因组和/或转录组序列可用性的增加,为大规模挖掘这类分子标记提供了绝佳的机会和资源。然而,目前的工具有限。因此,我们在此开发了一种新的流程CandiSSR,用于基于多个组装序列识别候选多态性SSR(PolySSRs)。该流程允许用户不仅从转录组数据集中识别推定的PolySSRs,还能从多个组装的基因组序列中识别。此外,还提供了两个置信度指标,包括SSR重复的标准差和缺失率,以系统地评估检测到的PolySSRs用于后续遗传特征分析的可行性。同时,还会自动为每个识别出的PolySSR设计引物对,并通过引物结合区域的全局序列相似性进行进一步评估,确保标记开发的成功率。用CandiSSR筛选水稻基因组并进行后续实验验证,准确率超过90%。此外,CandiSSR的应用已成功在拟南芥基因组和山茶转录组中识别出大量的PolySSRs。CandiSSR和PolySSR标记源可在以下网址公开获取:http://www.plantkingdomgdb.com/CandiSSR/index.html。