Ye Congting, Ji Guoli, Li Lei, Liang Chun
Department of Automation, Xiamen University, Xiamen, Fujian 361005, China; Department of Biology, Miami University, Oxford, Ohio 45056, United States of America.
Department of Automation, Xiamen University, Xiamen, Fujian 361005, China; Innovation Center for Cell Biology, Xiamen University, Xiamen, Fujian 361005, China.
PLoS One. 2014 Nov 19;9(11):e113349. doi: 10.1371/journal.pone.0113349. eCollection 2014.
Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures--hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.
反向重复序列在原核生物和真核生物基因组中大量存在,并且可以形成DNA二级结构——发夹结构和十字形结构,这些结构参与许多重要的生物学过程。需要能够高效准确地检测反向重复序列的生物信息学工具,因为现有的工具往往准确性较低且耗时,有时还无法处理基因组规模的输入数据。在此,我们展示了一个基于MATLAB的程序detectIR,用于检测完美和不完美的反向重复序列,该程序利用复数和向量计算,并允许进行基因组规模的数据输入。detectIR采用了一种新颖的算法,将反向重复序列检测中传统的序列字符串比较转换为复数的向量计算,允许配对茎中存在非互补对(错配)以及反向重复序列中间存在非回文间隔区(环或间隙)。与现有的流行工具相比,我们的程序在准确性和效率方面表现显著更高。使用来自HIV-1、拟南芥、智人和玉米的基因组序列数据进行比较,detectIR能够找到许多现有工具遗漏的反向重复序列,而现有工具的输出往往包含许多无效案例。detectIR是开源的,其源代码可在以下网址免费获取:https://sourceforge.net/projects/detectir 。