Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Bld. 2, 33 Leninsky Ave., 119071 Moscow, Russia.
Moscow Engineering Physics Institute, National Research Nuclear University MEPhI, 31 Kashirskoye Shosse, 115409 Moscow, Russia.
Int J Mol Sci. 2023 Jun 30;24(13):10964. doi: 10.3390/ijms241310964.
We have developed a de novo method for the identification of dispersed repeats based on the use of random position-weight matrices (PWMs) and an iterative procedure (IP). The created algorithm (IP method) allows detection of dispersed repeats for which the average number of substitutions between any two repeats per nucleotide () is less than or equal to 1.5. We have shown that all previously developed methods and algorithms (RED, RECON, and some others) can only find dispersed repeats for x ≤ 1.0. We applied the IP method to find dispersed repeats in the genomes of and nine other bacterial species. We identify three families of approximately 1.09 × 10, 0.64 × 10, and 0.58 × 10 DNA bases, respectively, constituting almost 50% of the complete genome. The length of the repeats is in the range of 400 to 600 bp. Other analyzed bacterial genomes contain one to three families of dispersed repeats with a total number of 10 to 6 × 10 copies. The existence of such highly divergent repeats could be associated with the presence of a single-type triplet periodicity in various genes or with the packing of bacterial DNA into a nucleoid.
我们开发了一种新的方法,用于识别分散重复,该方法基于随机位置权重矩阵 (PWMs) 和迭代过程 (IP)。创建的算法 (IP 方法) 可以检测到平均每个核苷酸的两个重复之间的替换数 () 小于或等于 1.5 的分散重复。我们已经表明,以前开发的所有方法和算法 (RED、RECON 等) 只能找到 x ≤ 1.0 的分散重复。我们将 IP 方法应用于和其他九种细菌基因组中的分散重复的寻找。我们鉴定了三个家族,分别约为 1.09×10 、 0.64×10 和 0.58×10 个 DNA 碱基,分别构成了完整 基因组的近 50%。重复的长度在 400 到 600 bp 之间。其他分析的细菌基因组包含一个到三个家族的分散重复,总数为 10 到 6×10 个拷贝。如此高度变异的重复的存在可能与各种基因中存在单一类型的三联体周期性或细菌 DNA 包装成核小体有关。