Melodelima Christelle, Guéguen Laurent, Piau Didier, Gautier Christian
Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon I, Lyon, 43 bd. Du 11 Novembre 1918 69622 Villeurbanne Cedex, France.
Gene. 2006 Dec 30;385:41-9. doi: 10.1016/j.gene.2006.04.032. Epub 2006 Aug 17.
Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.
哺乳动物基因组被组织成由不同的、相对均一的G+C含量区域(一般长度超过300 kb)构成的镶嵌体。G+C含量是等密度区的基本特征,但它们也与许多其他生物学特性相关。例如,基因在富含G+C的等密度区更为紧凑,其密度也最高。已经开发出多种在人类基因组中定位等密度区的方法,但这些方法仅使用DNA序列的碱基组成。本文提出了一种基于隐马尔可夫模型的新方法,该方法考虑了与基因组等密度区结构相关的多种生物学特性。这种方法能够将人类基因组很好地分割成等密度区,还允许对富含G+C的等密度区已知的异质性进行新的分析:嵌入富含G+C的等密度区中的大多数(60%)富含G+C的基因具有富含G+C基因的UTR序列特征。本文在进化和基因组功能的背景下对这一基因组特征进行了讨论。