Parisi Valerio, De Fonzo Valeria, Aluffi-Pentini Filippo
Sez. INFM, EuroBioPark, Univ. Roma 'Tor Vergata' Via della Ricerca Scientifica 1, 00133 Roma, Italy.
Bioinformatics. 2003 Sep 22;19(14):1733-8. doi: 10.1093/bioinformatics/btg268.
The importance of Tandem Repeats in some genomes is now well established. We have reported elsewhere some interesting new results obtained by means of a preliminary program for finding Tandem Repeats in DNA sequences, together with a brief description of the basic ideas of the algorithm. We describe here a completely new program based only in part on those ideas, we briefly discuss the interpretation of the results, and, by way of example, we provide a few novel results relative to the parasites responsible of two re-emerging diseases, Plasmodium falciparum and Mycobacterium tuberculosis. Our program is portable, effective, powerful and fast: it can run on current desktop computers, and it finds all significant Tandem Repeats also in the longest segments of sequences in databases (up to millions of bases), in short times (minutes).
An academic version of the algorithm (full source listing in standard C language) can be freely downloaded (http://www.caspur.it/~castri/STRING/).
Some illustrative figures and some sample results are provided as supplementary material at: http://www.caspur.it/~castri/STRING/
串联重复序列在某些基因组中的重要性现已得到充分证实。我们在其他地方报告了通过一个用于在DNA序列中寻找串联重复序列的初步程序获得的一些有趣的新结果,以及该算法基本思想的简要描述。我们在此描述一个全新的程序,该程序仅部分基于那些思想,我们简要讨论结果的解释,并举例提供一些与导致两种再度出现疾病的寄生虫——恶性疟原虫和结核分枝杆菌相关的新结果。我们的程序具有可移植性、高效性、强大功能且速度快:它可以在当前的台式计算机上运行,并且能在短时间内(几分钟)在数据库中最长的序列片段(长达数百万个碱基)中找到所有重要的串联重复序列。
该算法的学术版本(标准C语言的完整源代码列表)可免费下载(http://www.caspur.it/~castri/STRING/)。
一些说明性图表和一些示例结果作为补充材料提供在:http://www.caspur.it/~castri/STRING/