Sharma Deepak, Issac Biju, Raghava G P S, Ramaswamy R
Department of Biotechnology, All India Institute of Medical Sciences, New Delhi 110029, India.
Bioinformatics. 2004 Jun 12;20(9):1405-12. doi: 10.1093/bioinformatics/bth103. Epub 2004 Feb 19.
Repetitive DNA sequences, besides having a variety of regulatory functions, are one of the principal causes of genomic instability. Understanding their origin and evolution is of fundamental importance for genome studies. The identification of repeats and their units helps in deducing the intra-genomic dynamics as an important feature of comparative genomics. A major difficulty in identification of repeats arises from the fact that the repeat units can be either exact or imperfect, in tandem or dispersed, and of unspecified length.
The Spectral Repeat Finder program circumvents these problems by using a discrete Fourier transformation to identify significant periodicities present in a sequence. The specific regions of the sequence that contribute to a given periodicity are located through a sliding window analysis, and an exact search method is then used to find the repetitive units. Efficient and complete detection of repeats is provided together with interactive and detailed visualization of the spectral analysis of input sequence. We demonstrate the utility of our method with various examples that contain previously unannotated repeats. A Web server has been developed for convenient access to the automated program.
The Web server is available at http://www.imtech.res.in/raghava/srf and http://www2.imtech.res.in/raghava/srf
重复DNA序列除了具有多种调控功能外,还是基因组不稳定的主要原因之一。了解它们的起源和进化对于基因组研究至关重要。重复序列及其单元的识别有助于推断基因组内动态变化,这是比较基因组学的一个重要特征。重复序列识别的一个主要困难在于,重复单元可以是精确的或不完美的,可以是串联的或分散的,并且长度未指定。
频谱重复序列查找程序通过使用离散傅里叶变换来识别序列中存在的显著周期性,从而规避了这些问题。通过滑动窗口分析确定对给定周期性有贡献的序列特定区域,然后使用精确搜索方法找到重复单元。该程序能高效且完整地检测重复序列,并对输入序列的频谱分析进行交互式和详细的可视化展示。我们通过包含先前未注释重复序列的各种示例证明了该方法的实用性。现已开发了一个网络服务器,以便方便地访问这个自动化程序。
该网络服务器可在http://www.imtech.res.in/raghava/srf和http://www2.imtech.res.in/raghava/srf上获取