Graduate Program in Informatics, Federal University of Technology - Paraná, Cornélio Procópio, Paraná, Brazil.
PLoS Comput Biol. 2013;9(10):e1003234. doi: 10.1371/journal.pcbi.1003234. Epub 2013 Oct 3.
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.
离散马尔可夫模型可用于描述数值序列中的模式,在生物序列分析中有许多应用,包括基因预测、CpG 岛检测、比对和蛋白质分析。我们提出了 ToPS,这是一个计算框架,可以通过组合八种模型来实现生物信息学分析中的不同应用:(i)独立同分布过程;(ii) 可变长度马尔可夫链;(iii)非齐次马尔可夫链;(iv)隐马尔可夫模型;(v) 轮廓隐马尔可夫模型;(vi)对隐马尔可夫模型;(vii)广义隐马尔可夫模型;和(viii)基于相似性的序列加权。该框架包括用于模型训练、模拟和解码的功能。此外,它还提供了两种帮助设置参数的方法:赤池信息量准则(AIC)和贝叶斯信息量准则(BIC)。这些模型可以独立使用,也可以组合在贝叶斯分类器中,或者使用 GHMM 包含在更复杂的多模型概率架构中。特别是,该框架提供了一种新颖、灵活的 GHMM 解码实现,可检测架构是否可以有效地遍历。