Suppr超能文献

数据压缩概念、算法及其在生物信息学中的应用。

Data Compression Concepts and Algorithms and their Applications to Bioinformatics.

作者信息

Nalbantog̃lu O U, Russell D J, Sayood K

机构信息

Department of Electrical Engineering, University of Nebraska, Lincoln, NE 68588-0511, USA.

出版信息

Entropy (Basel). 2010 Jan 1;12(1):34. doi: 10.3390/e12010034.

Abstract

Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.

摘要

数据压缩本质上关注的是信息在数据中是如何组织的。理解这种组织方式能够带来表示信息的有效方法,进而实现数据压缩。在本文中,我们回顾了数据压缩理论与实践中的基本思想和方法在生物信息学领域的应用方式。我们探讨了数据压缩中的基本理论思想,如熵、互信息和复杂度的概念,是如何用于分析生物序列,以发现隐藏模式、推断生物体之间的系统发育关系以及研究病毒群体的。最后,我们考察了生物序列的推断语法是如何用于揭示生物序列中的结构的。

相似文献

2
Causal discovery using compression-complexity measures.使用压缩复杂度测度进行因果发现。
J Biomed Inform. 2021 May;117:103724. doi: 10.1016/j.jbi.2021.103724. Epub 2021 Mar 13.
3
Bioinformatics tools for the sequence complexity estimates.用于序列复杂性估计的生物信息学工具。
Biophys Rev. 2023 Sep 15;15(5):1367-1378. doi: 10.1007/s12551-023-01140-y. eCollection 2023 Oct.
9
Grammatical inference in bioinformatics.生物信息学中的语法推断
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1051-62. doi: 10.1109/TPAMI.2005.140.

引用本文的文献

3
Storage Space Allocation Strategy for Digital Data with Message Importance.
Entropy (Basel). 2020 May 25;22(5):591. doi: 10.3390/e22050591.
6
Sequence Factorization with Multiple References.具有多个参考的序列分解
PLoS One. 2015 Sep 30;10(9):e0139000. doi: 10.1371/journal.pone.0139000. eCollection 2015.
8
Information theory applications for biological sequence analysis.信息论在生物序列分析中的应用。
Brief Bioinform. 2014 May;15(3):376-89. doi: 10.1093/bib/bbt068. Epub 2013 Sep 20.
10
Adaptive efficient compression of genomes.基因组的自适应高效压缩
Algorithms Mol Biol. 2012 Nov 12;7(1):30. doi: 10.1186/1748-7188-7-30.

本文引用的文献

2
Use of average mutual information for studying changes in HIV populations.利用平均互信息研究HIV群体的变化。
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3861-4. doi: 10.1109/IEMBS.2009.5332579.
5
Textual data compression in computational biology: a synopsis.计算生物学中的文本数据压缩:概述。
Bioinformatics. 2009 Jul 1;25(13):1575-86. doi: 10.1093/bioinformatics/btp117. Epub 2009 Feb 27.
6
A formal language-based approach in biology.生物学中基于形式语言的方法。
Comp Funct Genomics. 2004;5(1):91-4. doi: 10.1002/cfg.364.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验