Elhaik Eran, Graur Dan
Department of Mental Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA.
Department of Biology and Biochemistry, University of Houston, Houston, TX 77204-5001, USA.
ISRN Bioinform. 2013 Apr 18;2013:725434. doi: 10.1155/2013/725434. eCollection 2013.
Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called "compositional domains," each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter(+) to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter(+) pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter(+) by applying it to human and insect genomes. The computational tools and data repository are available online.
真核生物基因组,尤其是动物基因组,具有复杂、不均匀且非随机的内部组成结构。动物基因组的组成结构可被描述为离散基因组区域的镶嵌体,这些区域被称为“组成域”,每个组成域都有独特的GC含量,与相邻的上下游区域有显著差异。典型的动物基因组由长度和核苷酸组成各异的组成均匀和不均匀的区域混合而成,这些区域相互穿插。我们设计了IsoPlotter,一种用于推断基因组组成结构的无偏分割算法。IsoPlotter已成为描述基因组组成不可或缺的工具,并已用于十多个基因组的分析。其应用包括描述新基因组、将域组成与基因组成及其密度相关联、研究基因组的进化、检验系统发育基因组学假设,以及检测人类与已灭绝古人类之间潜在杂交的区域。为了扩展IsoPlotter的用途,我们设计了一个完全自动化的流程,称为IsoPlotter(+),用于进行所有分割分析,包括图形显示,并建立了一个所有已完全测序的脊椎动物和无脊椎动物基因组组成域图谱的存储库。IsoPlotter(+)流程和存储库为基因组组成结构的研究提供了全面的解决方案。在这里,我们通过将IsoPlotter(+)应用于人类和昆虫基因组来进行演示。计算工具和数据存储库可在线获取。