马尔可夫链的模式统计与参数估计的敏感性

Pattern statistics on Markov chains and sensitivity to parameter estimation.

作者信息

Nuel Grégory

机构信息

Laboratoire Statistique et Génome, University of Evry, CNRS (8071), INRA(1152), 523, place des terrasses de I'Agora, 91034 Evry CEDEX, France.

出版信息

Algorithms Mol Biol. 2006 Oct 17;1:17. doi: 10.1186/1748-7188-1-17.

DOI:10.1186/1748-7188-1-17

PMID:17044916

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1647278/

Abstract

BACKGROUND

In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...).

RESULTS

In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of sigma, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered.

CONCLUSION

We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation.

摘要