Melodelima Christelle, Gautier Christian, Piau Didier
UMR 5558 CNRS Biométrie et Biologie Evolutive, Université Claude Bernard Lyon 1, 43 boulevard du 11 Novembre 1818, 69622 Villeurbanne Cedex, France.
J Math Biol. 2007 Sep;55(3):353-64. doi: 10.1007/s00285-007-0087-5. Epub 2007 May 8.
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.
隐马尔可夫模型(HMMs)是检测一系列统计上同质结构的有效工具,但它们不太适合分析复杂结构。例如,HMM一个状态的停留持续时间必须遵循几何分布规律。在使用HMMs从转座子或逆转录病毒中分离基因,或确定基因的等容线类别时,还会遇到许多其他方法上的困难。本文的目的是分析这些方法上的困难,并提出探索基因组数据的新工具。我们表明,通过使用几何分布的卷积,HMMs可用于分析具有钟形长度分布的复杂基因结构。因此,我们引入了宏状态来对区域长度的分布进行建模。我们的研究表明,简单的HMM可用于对小鼠基因组的等容线组织进行建模。马尔可夫模型在帮助数据探索方面的这种潜在用途至今一直被低估。