Nikolaou Christoforos
Computational Genomics Group, Department of Biology, University of Crete, 71409 Herakleion, Crete, Greece.
Comput Biol Chem. 2014 Dec;53 Pt A:134-43. doi: 10.1016/j.compbiolchem.2014.08.018. Epub 2014 Aug 20.
Genomic sequences exhibit self-organization properties at various hierarchical levels. One such is the gene structure of higher eukaryotes with its complex exon/intron arrangement. Exon sizes and exon numbers in genes have been shown to conform to a law derived from statistical linguistics and formulated by Menzerath and Altmann, according to which the mean size of the constituents of an entity is inversely related to the number of these constituents. We herein perform a detailed analysis of this property in the complete exon set of the mouse genome in correlation to the sequence conservation of each exon and the transcriptional complexity of each gene locus. We show that extensive linear fits, representative of accordance to Menzerath-Altmann law are restricted to a particular subset of genes that are formed by exons under low or intermediate sequence constraints and have a small number of alternative transcripts. Based on this observation we propose a hypothesis for the law of Menzerath-Altmann in mammalian genes being predominantly due to genes that are more versatile in function and thus, more prone to undergo changes in their structure. To this end we demonstrate one test case where gene categories of different functionality also show differences in the extent of conformity to Menzerath-Altmann law.
基因组序列在不同层次水平上表现出自组织特性。其中之一是高等真核生物的基因结构及其复杂的外显子/内含子排列。基因中的外显子大小和外显子数量已被证明符合一条源自统计语言学、由门泽拉斯和阿尔特曼提出的定律,根据该定律,一个实体的组成部分的平均大小与这些组成部分的数量成反比。我们在此对小鼠基因组的完整外显子集的这一特性进行了详细分析,并将其与每个外显子的序列保守性以及每个基因座的转录复杂性相关联。我们表明,代表符合门泽拉斯 - 阿尔特曼定律的广泛线性拟合仅限于由处于低或中等序列约束下的外显子形成且具有少量可变转录本的特定基因子集。基于这一观察结果,我们提出了一个假设,即哺乳动物基因中的门泽拉斯 - 阿尔特曼定律主要归因于功能更具通用性、因此更易于发生结构变化的基因。为此,我们展示了一个测试案例,其中不同功能的基因类别在符合门泽拉斯 - 阿尔特曼定律的程度上也存在差异。