Kmer-SSR：一种快速而全面的 SSR 搜索算法。

Kmer-SSR: a fast and exhaustive SSR search algorithm.

机构信息

Department of Biology, BYU, Provo, UT 84602, USA.

出版信息

Bioinformatics. 2017 Dec 15;33(24):3922-3928. doi: 10.1093/bioinformatics/btx538.

DOI:10.1093/bioinformatics/btx538

PMID:28968741

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860095/

Abstract

MOTIVATION

One of the main challenges with bioinformatics software is that the size and complexity of datasets necessitate trading speed for accuracy, or completeness. To combat this problem of computational complexity, a plethora of heuristic algorithms have arisen that report a 'good enough' solution to biological questions. However, in instances such as Simple Sequence Repeats (SSRs), a 'good enough' solution may not accurately portray results in population genetics, phylogenetics and forensics, which require accurate SSRs to calculate intra- and inter-species interactions.

RESULTS

We present Kmer-SSR, which finds all SSRs faster than most heuristic SSR identification algorithms in a parallelized, easy-to-use manner. The exhaustive Kmer-SSR option has 100% precision and 100% recall and accurately identifies every SSR of any specified length. To identify more biologically pertinent SSRs, we also developed several filters that allow users to easily view a subset of SSRs based on user input. Kmer-SSR, coupled with the filter options, accurately and intuitively identifies SSRs quickly and in a more user-friendly manner than any other SSR identification algorithm.

AVAILABILITY AND IMPLEMENTATION

The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR.

CONTACT

perry.ridge@byu.edu.

摘要

动机

生物信息学软件面临的主要挑战之一是，数据集的大小和复杂性需要在速度和准确性或完整性之间进行权衡。为了解决计算复杂度的问题，出现了大量启发式算法，这些算法为生物问题提供了一个“足够好”的解决方案。然而，在简单序列重复（SSR）等情况下，“足够好”的解决方案可能无法准确描述群体遗传学、系统发生学和法医学中的结果，这些领域需要准确的 SSR 来计算种内和种间相互作用。

结果

我们提出了 Kmer-SSR，它以并行化、易于使用的方式比大多数启发式 SSR 识别算法更快地找到所有 SSR。详尽的 Kmer-SSR 选项具有 100%的精度和 100%的召回率，并且可以准确识别任何指定长度的每个 SSR。为了识别更具生物学意义的 SSR，我们还开发了几个过滤器，允许用户根据用户输入轻松查看 SSR 的子集。Kmer-SSR 与过滤器选项结合使用，可以比任何其他 SSR 识别算法更准确、直观地快速识别 SSR，并且更用户友好。

可用性和实现

源代码可在 GitHub 上免费获得，网址为 https://github.com/ridgelab/Kmer-SSR。

联系人

perry.ridge@byu.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8525/5860095/c7444022a304/btx538f1.jpg

相似文献

Kmer-SSR: a fast and exhaustive SSR search algorithm.

Bioinformatics. 2017 Dec 15;33(24):3922-3928. doi: 10.1093/bioinformatics/btx538.

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences.

Bioinformatics. 2016 Sep 1;32(17):2707-9. doi: 10.1093/bioinformatics/btw298. Epub 2016 May 11.

PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences.

Bioinformatics. 2018 Mar 15;34(6):943-948. doi: 10.1093/bioinformatics/btx721.

SATIN: a micro and mini satellite mining tool of total genome and coding regions with analysis of perfect repeats polymorphism in coding regions.

BMC Bioinformatics. 2024 Jun 18;25(1):217. doi: 10.1186/s12859-024-05842-2.

ESAP plus: a web-based server for EST-SSR marker development.

BMC Genomics. 2016 Dec 22;17(Suppl 13):1035. doi: 10.1186/s12864-016-3328-4.

An integrated strategy for target SSR genotyping with toleration of nucleotide variations in the SSRs and flanking regions.

BMC Bioinformatics. 2021 Sep 8;22(1):429. doi: 10.1186/s12859-021-04351-w.

WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

J Hered. 2018 Mar 16;109(3):339-343. doi: 10.1093/jhered/esx075.

RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data.

BMC Genomics. 2016 Jun 3;17:422. doi: 10.1186/s12864-016-2686-2.

Large-scale identification of polymorphic microsatellites using an in silico approach.

BMC Bioinformatics. 2008 Sep 15;9:374. doi: 10.1186/1471-2105-9-374.

JustOrthologs: a fast, accurate and user-friendly ortholog identification algorithm.

Bioinformatics. 2019 Feb 15;35(4):546-552. doi: 10.1093/bioinformatics/bty669.

引用本文的文献

Development of strain specific simple sequence repeats and assessment of genetic diversity in Erwinia amylovora from marker selection to phylogenetic analysis.

Sci Rep. 2025 Aug 19;15(1):30357. doi: 10.1038/s41598-025-15530-7.

Pytrf: a python package for finding tandem repeats from genomic sequences.

BMC Bioinformatics. 2025 Jun 4;26(1):151. doi: 10.1186/s12859-025-06168-3.

Gene conversion and duplication contribute to genetic variation in an outbreak of .

Microb Genom. 2025 May;11(5). doi: 10.1099/mgen.0.001396.

The first engkabang jantong () genome survey data.

Data Brief. 2024 Dec 20;58:111248. doi: 10.1016/j.dib.2024.111248. eCollection 2025 Feb.

Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning.

Plants (Basel). 2024 Sep 19;13(18):2619. doi: 10.3390/plants13182619.

MicrosatNavigator: exploring nonrandom distribution and lineage-specificity of microsatellite repeat motifs on vertebrate sex chromosomes across 186 whole genomes.

Chromosome Res. 2023 Sep 30;31(4):29. doi: 10.1007/s10577-023-09738-4.

BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.

Front Big Data. 2022 Jan 18;4:727216. doi: 10.3389/fdata.2021.727216. eCollection 2021.

SSRgenotyper: A simple sequence repeat genotyping application for whole-genome resequencing and reduced representational sequencing projects.

Appl Plant Sci. 2020 Dec 3;8(12):e11402. doi: 10.1002/aps3.11402. eCollection 2020 Dec.

SSRMMD: A Rapid and Accurate Algorithm for Mining SSR Feature Loci and Candidate Polymorphic SSRs Based on Assembled Sequences.

Front Genet. 2020 Jul 27;11:706. doi: 10.3389/fgene.2020.00706. eCollection 2020.

Developing an ultra-efficient microsatellite discoverer to find structural differences between SARS-CoV-1 and Covid-19.

Inform Med Unlocked. 2020;19:100356. doi: 10.1016/j.imu.2020.100356. Epub 2020 May 21.

本文引用的文献

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences.

Bioinformatics. 2016 Sep 1;32(17):2707-9. doi: 10.1093/bioinformatics/btw298. Epub 2016 May 11.

ProGeRF: proteome and genome repeat finder utilizing a fast parallel hash function.

Biomed Res Int. 2015;2015:394157. doi: 10.1155/2015/394157. Epub 2015 Feb 25.

Enhanced regulatory sequence prediction using gapped k-mer features.

PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.

QDD version 3.1: a user-friendly computer program for microsatellite selection and primer design revisited: experimental validation of variables determining genotyping success rate.

Mol Ecol Resour. 2014 Nov;14(6):1302-13. doi: 10.1111/1755-0998.12271. Epub 2014 May 26.

SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

J Hered. 2013 Nov-Dec;104(6):881-5. doi: 10.1093/jhered/est056. Epub 2013 Sep 19.

GMATo: A novel tool for the identification and analysis of microsatellites in large genomes.

Bioinformation. 2013 Jun 8;9(10):541-4. doi: 10.6026/97320630009541. Print 2013.

Informed and automated k-mer size selection for genome assembly.

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

The Chlamydomonas genome reveals the evolution of key animal and plant functions.

Science. 2007 Oct 12;318(5848):245-50. doi: 10.1126/science.1143609.

Simple sequence repeats as advantageous mutators in evolution.

Trends Genet. 2006 May;22(5):253-9. doi: 10.1016/j.tig.2006.03.005. Epub 2006 Mar 29.

Simple sequence repeat marker loci discovery using SSR primer.

Bioinformatics. 2004 Jun 12;20(9):1475-6. doi: 10.1093/bioinformatics/bth104. Epub 2004 Feb 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Kmer-SSR：一种快速而全面的 SSR 搜索算法。

Kmer-SSR: a fast and exhaustive SSR search algorithm.

机构信息

Department of Biology, BYU, Provo, UT 84602, USA.

出版信息

Bioinformatics. 2017 Dec 15;33(24):3922-3928. doi: 10.1093/bioinformatics/btx538.

DOI:10.1093/bioinformatics/btx538

PMID:28968741

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860095/

Abstract

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

The source code is freely available on GitHub at https://github.com/ridgelab/Kmer-SSR.

CONTACT

perry.ridge@byu.edu.

摘要

动机

结果

可用性和实现

源代码可在 GitHub 上免费获得，网址为 https://github.com/ridgelab/Kmer-SSR。

联系人

perry.ridge@byu.edu。

Kmer-SSR：一种快速而全面的 SSR 搜索算法。

Kmer-SSR: a fast and exhaustive SSR search algorithm.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

动机

结果

可用性和实现

联系人

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Kmer-SSR：一种快速而全面的 SSR 搜索算法。

Kmer-SSR: a fast and exhaustive SSR search algorithm.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

动机

结果

可用性和实现

联系人

相似文献

引用本文的文献

本文引用的文献