Nuel Gregory
Laboratoire Statistique et Genome, CNRS (8071), INRA (1152), UEVE, Evry, France.
Stat Appl Genet Mol Biol. 2006;5:Article26. doi: 10.2202/1544-6115.1219. Epub 2006 Oct 17.
We propose here a review of the methods available to compute pattern statistics on text generated by a Markov source. Theoretical, but also numerical aspects are detailed for a wide range of techniques (exact, Gaussian, large deviations, binomial and compound Poisson). The SPatt package (Statistics for Pattern, free software available at http://stat.genopole.cnrs.fr/spatt) implementing all these methods is then used to compare all these approaches in terms of computational time and reliability in the most complete pattern statistics benchmark available at the present time.
我们在此提议对可用于计算马尔可夫源生成文本的模式统计量的方法进行综述。针对广泛的技术(精确方法、高斯方法、大偏差方法、二项式方法和复合泊松方法),详细阐述了理论及数值方面的内容。然后,使用实现所有这些方法的SPatt软件包(模式统计,可从http://stat.genopole.cnrs.fr/spatt获取的免费软件),在当前最完整的模式统计基准测试中,从计算时间和可靠性方面对所有这些方法进行比较。