SA-SSR:一种基于后缀数组的算法,用于在大型基因序列中全面高效地发现简单重复序列(SSR)

SA-SSR: a suffix array-based algorithm for exhaustive and efficient SSR discovery in large genetic sequences.

作者信息

Pickett B D, Karlinsey S M, Penrod C E, Cormier M J, Ebbert M T W, Shiozawa D K, Whipple C J, Ridge P G

机构信息

Department of Biology, Brigham Young University, Provo, UT 84602, USA.

出版信息

Bioinformatics. 2016 Sep 1;32(17):2707-9. doi: 10.1093/bioinformatics/btw298. Epub 2016 May 11.

Abstract

UNLABELLED

Simple Sequence Repeats (SSRs) are used to address a variety of research questions in a variety of fields (e.g. population genetics, phylogenetics, forensics, etc.), due to their high mutability within and between species. Here, we present an innovative algorithm, SA-SSR, based on suffix and longest common prefix arrays for efficiently detecting SSRs in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). Our algorithm addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software.

AVAILABILITY AND IMPLEMENTATION

SA-SSR is freely available at http://github.com/ridgelab/SA-SSR CONTACT: perry.ridge@byu.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未标注

简单序列重复(SSRs)因其在物种内部和物种之间具有高度变异性,被用于解决各个领域的各种研究问题(如群体遗传学、系统发育学、法医学等)。在此,我们提出一种基于后缀和最长公共前缀数组的创新算法SA-SSR,用于在大量序列中高效检测简单序列重复。现有的简单序列重复检测应用受到一个或多个限制(即速度、准确性、易用性等)的阻碍。我们的算法在解决这些挑战的同时,是现有最全面且正确的简单序列重复检测软件。SA-SSR的准确率为100%,比第二好的算法多检测出1000多个简单序列重复,同时为用户提供了比任何现有软件更大的控制权。

可用性与实现

SA-SSR可在http://github.com/ridgelab/SA-SSR上免费获取。

联系方式

perry.ridge@byu.edu

补充信息

补充数据可在《生物信息学》在线版获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索