Division of Software and Information Systems, School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798.
Brief Bioinform. 2013 Jan;14(1):67-81. doi: 10.1093/bib/bbs023. Epub 2012 May 29.
The prevalence of tandem repeats in eukaryotic genomes and their association with a number of genetic diseases has raised considerable interest in locating these repeats. Over the last 10-15 years, numerous tools have been developed for searching tandem repeats, but differences in the search algorithms adopted and difficulties with parameter settings have confounded many users resulting in widely varying results. In this review, we have systematically separated the algorithmic aspect of the search tools from the influence of the parameter settings. We hope that this will give a better understanding of how the tools differ in algorithmic performance, their inherent constraints and how one should approach in evaluating and selecting them.
真核生物基因组中串联重复序列的出现及其与许多遗传疾病的关联引起了人们对定位这些重复序列的极大兴趣。在过去的 10-15 年中,已经开发了许多用于搜索串联重复序列的工具,但由于采用的搜索算法不同以及参数设置的困难,许多用户感到困惑,导致结果差异很大。在这篇综述中,我们从参数设置的影响中系统地分离了搜索工具的算法方面。我们希望这将更好地理解工具在算法性能、内在限制以及如何评估和选择方面的差异。