CandiSSR：一种基于多个组装序列用于识别候选多态性简单重复序列的高效流程。

CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences.

作者信息

Xia En-Hua, Yao Qiu-Yang, Zhang Hai-Bin, Jiang Jian-Jun, Zhang Li-Ping, Gao Li-Zhi

机构信息

Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of SciencesKunming, China; University of Chinese Academy of SciencesBeijing, China.

Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species in Southwest China, Kunming Institute of Botany, Chinese Academy of Sciences Kunming, China.

出版信息

Front Plant Sci. 2016 Jan 7;6:1171. doi: 10.3389/fpls.2015.01171. eCollection 2015.

DOI:10.3389/fpls.2015.01171

PMID:26779212

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4703815/

Abstract

Simple sequence repeats (SSRs), also known as microsatellites, are ubiquitous short tandem duplications commonly found in genomes and/or transcriptomes of diverse organisms. They represent one of the most powerful molecular markers for genetic analysis and breeding programs because of their high mutation rate and neutral evolution. However, traditionally experimental screening of the SSR polymorphic status and their subsequent applicability to genetic studies are extremely labor-intensive and time-consuming. Thankfully, the recently decreased costs of next generation sequencing and increasing availability of large genome and/or transcriptome sequences have provided an excellent opportunity and sources for large-scale mining this type of molecular markers. However, current tools are limited. Thus we here developed a new pipeline, CandiSSR, to identify candidate polymorphic SSRs (PolySSRs) based on the multiple assembled sequences. The pipeline allows users to identify putative PolySSRs not only from the transcriptome datasets but also from multiple assembled genome sequences. In addition, two confidence metrics including standard deviation and missing rate of the SSR repetitions are provided to systematically assess the feasibility of the detected PolySSRs for subsequent application to genetic characterization. Meanwhile, primer pairs for each identified PolySSR are also automatically designed and further evaluated by the global sequence similarities of the primer-binding region, ensuring the successful rate of the marker development. Screening rice genomes with CandiSSR and subsequent experimental validation showed an accuracy rate of over 90%. Besides, the application of CandiSSR has successfully identified a large number of PolySSRs in the Arabidopsis genomes and Camellia transcriptomes. CandiSSR and the PolySSR marker sources are publicly available at: http://www.plantkingdomgdb.com/CandiSSR/index.html.

摘要

简单序列重复（SSRs），也被称为微卫星，是普遍存在的短串联重复序列，常见于各种生物的基因组和/或转录组中。由于其高突变率和中性进化，它们是遗传分析和育种计划中最强大的分子标记之一。然而，传统上对SSR多态性状态的实验筛选及其随后在遗传研究中的适用性极其耗费人力和时间。幸运的是，最近下一代测序成本的降低以及大型基因组和/或转录组序列可用性的增加，为大规模挖掘这类分子标记提供了绝佳的机会和资源。然而，目前的工具有限。因此，我们在此开发了一种新的流程CandiSSR，用于基于多个组装序列识别候选多态性SSR（PolySSRs）。该流程允许用户不仅从转录组数据集中识别推定的PolySSRs，还能从多个组装的基因组序列中识别。此外，还提供了两个置信度指标，包括SSR重复的标准差和缺失率，以系统地评估检测到的PolySSRs用于后续遗传特征分析的可行性。同时，还会自动为每个识别出的PolySSR设计引物对，并通过引物结合区域的全局序列相似性进行进一步评估，确保标记开发的成功率。用CandiSSR筛选水稻基因组并进行后续实验验证，准确率超过90%。此外，CandiSSR的应用已成功在拟南芥基因组和山茶转录组中识别出大量的PolySSRs。CandiSSR和PolySSR标记源可在以下网址公开获取：http://www.plantkingdomgdb.com/CandiSSR/index.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2461/4703815/c82037e0d8fa/fpls-06-01171-g001.jpg

相似文献

CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences.

Front Plant Sci. 2016 Jan 7;6:1171. doi: 10.3389/fpls.2015.01171. eCollection 2015.

Large-scale identification of polymorphic microsatellites using an in silico approach.

BMC Bioinformatics. 2008 Sep 15;9:374. doi: 10.1186/1471-2105-9-374.

SSRMMD: A Rapid and Accurate Algorithm for Mining SSR Feature Loci and Candidate Polymorphic SSRs Based on Assembled Sequences.

Front Genet. 2020 Jul 27;11:706. doi: 10.3389/fgene.2020.00706. eCollection 2020.

Pipeline for developing polymorphic microsatellites in species without reference genomes.

3 Biotech. 2022 Oct;12(10):248. doi: 10.1007/s13205-022-03313-0. Epub 2022 Aug 26.

IDSSR: An Efficient Pipeline for Identifying Polymorphic Microsatellites from a Single Genome Sequence.

Int J Mol Sci. 2019 Jul 16;20(14):3497. doi: 10.3390/ijms20143497.

Develop a preliminary core germplasm with the novel polymorphism EST-SSRs derived from three transcriptomes of colored calla lily ().

Front Plant Sci. 2023 Feb 2;14:1055881. doi: 10.3389/fpls.2023.1055881. eCollection 2023.

SSREnricher: a computational approach for large-scale identification of polymorphic microsatellites based on comparative transcriptome analysis.

PeerJ. 2020 Jul 2;8:e9372. doi: 10.7717/peerj.9372. eCollection 2020.

GMATA: An Integrated Software Package for Genome-Scale SSR Mining, Marker Development and Viewing.

Front Plant Sci. 2016 Sep 13;7:1350. doi: 10.3389/fpls.2016.01350. eCollection 2016.

A pipeline for effectively developing highly polymorphic simple sequence repeats markers based on multi-sample genomic data.

Ecol Evol. 2022 Mar 6;12(3):e8705. doi: 10.1002/ece3.8705. eCollection 2022 Mar.

Mining and characterization of novel EST-SSR markers of Parrotia subaequalis (Hamamelidaceae) from the first Illumina-based transcriptome datasets.

PLoS One. 2019 May 6;14(5):e0215874. doi: 10.1371/journal.pone.0215874. eCollection 2019.

引用本文的文献

Genome Skimming Reveals Plastome Conservation, Phylogenetic Structure, and Novel Molecular Markers in Valuable Orchid .

Genes (Basel). 2025 Jun 20;16(7):723. doi: 10.3390/genes16070723.

SSR_VibraProfiler: a Python package for accurate classification of varieties using SSRs with intra-variety specificity and inter-variety polymorphism.

Plant Methods. 2025 May 16;21(1):61. doi: 10.1186/s13007-025-01380-x.

Development of genome-wide SSR markers through mining of guava ( L.) genome for genetic diversity analysis and transferability studies across species and genera.

Front Plant Sci. 2025 Apr 25;16:1527866. doi: 10.3389/fpls.2025.1527866. eCollection 2025.

Genome-wide microsatellite characterization and their marker development and transferability in Broussonetia Species.

BMC Genomics. 2025 Jan 22;26(1):61. doi: 10.1186/s12864-025-11238-0.

Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning.

Plants (Basel). 2024 Sep 19;13(18):2619. doi: 10.3390/plants13182619.

TriticeaeSSRdb: a comprehensive database of simple sequence repeats in .

Front Plant Sci. 2024 May 22;15:1412953. doi: 10.3389/fpls.2024.1412953. eCollection 2024.

: A Pipeline for Identification of Polymorphic Microsatellites Loci within Assemblies of Related Species.

Int J Mol Sci. 2024 Mar 9;25(6):3169. doi: 10.3390/ijms25063169.

Low-coverage whole genome sequencing of diverse accessions for plastome resource development, polymorphic nuclear SSR identification, and phylogenetic analyses.

Front Plant Sci. 2024 Mar 6;15:1373297. doi: 10.3389/fpls.2024.1373297. eCollection 2024.

The Potential Role of Genic-SSRs in Driving Ecological Adaptation Diversity in Plants.

Int J Mol Sci. 2024 Feb 8;25(4):2084. doi: 10.3390/ijms25042084.

Intraspecific phylogeny and genomic resources development for an important medical plant , based on low-coverage whole genome sequencing data.

Front Plant Sci. 2023 Dec 12;14:1320473. doi: 10.3389/fpls.2023.1320473. eCollection 2023.

本文引用的文献

De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

BMC Genomics. 2015 Apr 15;16(1):298. doi: 10.1186/s12864-015-1494-4.

Rapid diversification of five Oryza AA genomes associated with rice adaptation.

Proc Natl Acad Sci U S A. 2014 Nov 18;111(46):E4954-62. doi: 10.1073/pnas.1418307111. Epub 2014 Nov 3.

Transcriptome analysis of the oil-rich tea plant, Camellia oleifera, reveals candidate genes related to lipid metabolism.

PLoS One. 2014 Aug 19;9(8):e104150. doi: 10.1371/journal.pone.0104150. eCollection 2014.

Genetic assessment of safflower (Carthamus tinctorius L.) collection with microsatellite markers acquired via pyrosequencing method.

Mol Ecol Resour. 2014 Jan;14(1):69-78. doi: 10.1111/1755-0998.12146. Epub 2013 Jul 23.

Predicting polymorphic EST-SSRs in silico.

Mol Ecol Resour. 2013 May;13(3):538-45. doi: 10.1111/1755-0998.12078. Epub 2013 Feb 11.

Primer3--new capabilities and interfaces.

Nucleic Acids Res. 2012 Aug;40(15):e115. doi: 10.1093/nar/gks596. Epub 2012 Jun 22.

Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction.

BMC Bioinformatics. 2012 Jun 18;13:134. doi: 10.1186/1471-2105-13-134.

Development of polymorphic microsatellite markers in Camellia chekiangoleosa (Theaceae) using 454-ESTs.

Am J Bot. 2012 May;99(5):e203-5. doi: 10.3732/ajb.1100486. Epub 2012 Apr 26.

Multiple reference genomes and transcriptomes for Arabidopsis thaliana.

Nature. 2011 Aug 28;477(7365):419-23. doi: 10.1038/nature10414.

Identification and characterization of 74 novel polymorphic EST-SSR markers in the tea plant, Camellia sinensis (Theaceae).

Am J Bot. 2010 Dec;97(12):e153-6. doi: 10.3732/ajb.1000376. Epub 2010 Nov 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CandiSSR：一种基于多个组装序列用于识别候选多态性简单重复序列的高效流程。

CandiSSR: An Efficient Pipeline used for Identifying Candidate Polymorphic SSRs Based on Multiple Assembled Sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献