Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia.
Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.
BMC Genomics. 2019 May 8;20(Suppl 3):295. doi: 10.1186/s12864-019-5536-1.
Mitochondria is a powerhouse of all eukaryotic cells that have its own circular DNA (mtDNA) encoding various RNAs and proteins. Somatic perturbations of mtDNA are accumulating with age thus it is of great importance to uncover the main sources of mtDNA instability. Recent analyses demonstrated that somatic mtDNA deletions depend on imperfect repeats of various nature between distant mtDNA segments. However, till now there are no comprehensive databases annotating all types of imperfect repeats in numerous species with sequenced complete mitochondrial genome as well as there are no algorithms capable to call all types of imperfect repeats in circular mtDNA.
We implemented naïve algorithm of pattern recognition by analogy to standard dot-plot construction procedures allowing us to find both perfect and imperfect repeats of four main types: direct, inverted, mirror and complementary. Our algorithm is adapted to specific characteristics of mtDNA such as circularity and an excess of short repeats - it calls imperfect repeats starting from the length of 10 b.p. We constructed interactive web available database ImtRDB depositing perfect and imperfect repeats positions in mtDNAs of more than 3500 Vertebrate species. Additional tools, such as visualization of repeats within a genome, comparison of repeat densities among different genomes and a possibility to download all results make this database useful for many biologists. Our first analyses of the database demonstrated that mtDNA imperfect repeats (i) are usually short; (ii) associated with unfolded DNA structures; (iii) four types of repeats positively correlate with each other forming two equivalent pairs: direct and mirror versus inverted and complementary, with identical nucleotide content and similar distribution between species; (iv) abundance of repeats is negatively associated with GC content; (v) dinucleotides GC versus CG are overrepresented on light chain of mtDNA covered by repeats.
ImtRDB is available at http://bioinfodbs.kantiana.ru/ImtRDB/ . It is accompanied by the software calling all types of interspersed repeats with different level of degeneracy in circular DNA. This database and software can become a very useful tool in various areas of mitochondrial and chloroplast DNA research.
线粒体是所有真核细胞的能量工厂,它拥有自己的圆形 DNA(mtDNA),编码各种 RNA 和蛋白质。随着年龄的增长,mtDNA 的体细胞扰动不断积累,因此揭示 mtDNA 不稳定性的主要来源非常重要。最近的分析表明,体细胞 mtDNA 缺失取决于不同 mtDNA 片段之间各种性质的不完美重复。然而,到目前为止,还没有全面的数据库注释所有类型的不完美重复,也没有能够调用圆形 mtDNA 中所有类型不完美重复的算法。
我们实现了基于模式识别的朴素算法,类似于标准的点图构建程序,使我们能够找到四种主要类型的完美和不完美重复:直接、倒置、镜像和互补。我们的算法适应于 mtDNA 的特定特征,例如圆形和短重复的过剩-它从 10bp 的长度开始调用不完美重复。我们构建了交互式网络可用数据库 ImtRDB,其中存储了 3500 多种脊椎动物 mtDNA 中的完美和不完美重复的位置。其他工具,如基因组内重复的可视化、不同基因组之间重复密度的比较以及下载所有结果的可能性,使该数据库对许多生物学家都很有用。我们对数据库的初步分析表明,mtDNA 不完美重复:(i)通常很短;(ii)与未折叠的 DNA 结构相关;(iii)四种类型的重复相互正相关,形成两个等效对:直接和镜像与倒置和互补,具有相同的核苷酸含量和相似的物种间分布;(iv)重复的丰度与 GC 含量呈负相关;(v)在重复覆盖的 mtDNA 的轻链上,二核苷酸 GC 与 CG 过表达。
ImtRDB 可在 http://bioinfodbs.kantiana.ru/ImtRDB/ 获得。它附带了一种软件,可以调用圆形 DNA 中具有不同简并度的所有类型的散布重复。这个数据库和软件可以成为线粒体和叶绿体 DNA 研究各个领域非常有用的工具。