Clayton School of Information Technology, Monash University, Clayton, VIC, 3800, Australia.
Adv Exp Med Biol. 2011;696:657-66. doi: 10.1007/978-1-4419-7046-6_67.
A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.
提出了一种生物压缩模型,该模型在压缩性能和速度方面均优于现有的压缩算法。该模型能够压缩整个真核生物基因组。最重要的是,该模型为从生物数据中发现知识提供了一个框架。它可用于重复元件发现、序列比对和系统发育分析。我们证明该模型可以处理具有统计偏差的序列和远缘相关的序列,而传统的知识发现工具通常在此类情况下失败。