Azad Rajeev K, Rao J Subba, Li Wentian, Ramaswamy Ramakrishna
School of Environmental Sciences, Jawaharlal Nehru University, New Delhi 110 067, India.
Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Sep;66(3 Pt 1):031913. doi: 10.1103/PhysRevE.66.031913. Epub 2002 Sep 25.
By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbors, may, however, share compositional similarity with one or more distant (non-neighboring) domains. We thus obtain a coarse-grained description of the given DNA string in terms of a smaller set of distinct domain labels. This yields a minimal domain description of a given DNA sequence, significantly reducing its organizational complexity. This procedure gives a new means of evaluating genomic complexity as one examines organisms ranging from bacteria to human. The mosaic organization of DNA sequences could have originated from the insertion of fragments of one genome (the parasite) inside another (the host), and we present numerical experiments that are suggestive of this scenario.
通过使用詹森 - 香农散度,基因组DNA可以通过标准的递归分割程序被划分为成分不同的结构域。每个结构域虽然与其相邻结构域有显著差异,但可能与一个或多个远距离(非相邻)结构域具有成分相似性。因此,我们根据一小组不同的结构域标签获得了给定DNA序列的粗粒度描述。这产生了给定DNA序列的最小结构域描述,显著降低了其组织复杂性。当研究从细菌到人类的各种生物体时,这个程序提供了一种评估基因组复杂性的新方法。DNA序列的镶嵌组织可能起源于一个基因组(寄生基因组)的片段插入到另一个基因组(宿主基因组)中,并且我们展示了暗示这种情况的数值实验。