Taneda Akito
Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan.
Bioinformatics. 2004 Mar 22;20(5):701-8. doi: 10.1093/bioinformatics/btg470. Epub 2004 Jan 22.
Repetitive DNA sequences are abundant in genomes and efficient mining of significant repeats is important as the first step of repetitive sequence research. Although many computational tools for the purpose, either automatic or visualization ones, have been developed, detection and analysis of approximate repeats are still non-trivial task.
Auto Dot PLOT (Adplot), a dotplot-like repetitive pattern visualization program with a window filtering based on iid Bernoulli trials, is developed and applied to yeast chromosomes and human T cell receptor locus sequence. Typical examples found in yeast chromosomes 1 and 10 and a tandem repeat of periods longer than 10,000 bp in human T cell receptor locus are presented. A complex structure composed of both direct and palindromic repeats found in yeast chromosome 10 is also visualized as specific dot pattern. Computational time measured by a Pentium 3 PC for each yeast auto chromosome with a standard parameter setting is linearly scaled and below 10 s per one chromosome, indicating efficiency of the program. From the examples, it is shown that Adplot can visualize approximate local repeat structures and give us a diagnosis power for inferring a duplicational history of repeats.
Adplot can be obtained by an e-mail request.
重复DNA序列在基因组中大量存在,高效挖掘重要重复序列作为重复序列研究的第一步至关重要。尽管已经开发了许多用于此目的的计算工具,无论是自动的还是可视化的,但检测和分析近似重复序列仍然是一项艰巨的任务。
开发了自动点阵图(Adplot),这是一种类似点阵图的重复模式可视化程序,具有基于独立同分布伯努利试验的窗口过滤功能,并将其应用于酵母染色体和人类T细胞受体基因座序列。展示了在酵母1号和10号染色体中发现的典型例子,以及人类T细胞受体基因座中周期超过10000 bp的串联重复序列。在酵母10号染色体中发现的由直接重复和回文重复组成的复杂结构也被可视化为特定的点阵模式。使用奔腾3个人计算机,在标准参数设置下,对每个酵母自动染色体测量的计算时间呈线性缩放,每条染色体低于10秒,表明该程序的效率。从这些例子可以看出,Adplot可以可视化近似的局部重复结构,并为推断重复序列的复制历史提供诊断能力。
可通过电子邮件请求获得Adplot。