Nalbantog̃lu O U, Russell D J, Sayood K
Department of Electrical Engineering, University of Nebraska, Lincoln, NE 68588-0511, USA.
Entropy (Basel). 2010 Jan 1;12(1):34. doi: 10.3390/e12010034.
Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.
数据压缩本质上关注的是信息在数据中是如何组织的。理解这种组织方式能够带来表示信息的有效方法,进而实现数据压缩。在本文中,我们回顾了数据压缩理论与实践中的基本思想和方法在生物信息学领域的应用方式。我们探讨了数据压缩中的基本理论思想,如熵、互信息和复杂度的概念,是如何用于分析生物序列,以发现隐藏模式、推断生物体之间的系统发育关系以及研究病毒群体的。最后,我们考察了生物序列的推断语法是如何用于揭示生物序列中的结构的。