Suppr超能文献

数据压缩概念、算法及其在生物信息学中的应用。

Data Compression Concepts and Algorithms and their Applications to Bioinformatics.

作者信息

Nalbantog̃lu O U, Russell D J, Sayood K

机构信息

Department of Electrical Engineering, University of Nebraska, Lincoln, NE 68588-0511, USA.

出版信息

Entropy (Basel). 2010 Jan 1;12(1):34. doi: 10.3390/e12010034.

Abstract

Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.

摘要

数据压缩本质上关注的是信息在数据中是如何组织的。理解这种组织方式能够带来表示信息的有效方法,进而实现数据压缩。在本文中,我们回顾了数据压缩理论与实践中的基本思想和方法在生物信息学领域的应用方式。我们探讨了数据压缩中的基本理论思想,如熵、互信息和复杂度的概念,是如何用于分析生物序列,以发现隐藏模式、推断生物体之间的系统发育关系以及研究病毒群体的。最后,我们考察了生物序列的推断语法是如何用于揭示生物序列中的结构的。

相似文献

1
Data Compression Concepts and Algorithms and their Applications to Bioinformatics.
Entropy (Basel). 2010 Jan 1;12(1):34. doi: 10.3390/e12010034.
2
Causal discovery using compression-complexity measures.
J Biomed Inform. 2021 May;117:103724. doi: 10.1016/j.jbi.2021.103724. Epub 2021 Mar 13.
3
Bioinformatics tools for the sequence complexity estimates.
Biophys Rev. 2023 Sep 15;15(5):1367-1378. doi: 10.1007/s12551-023-01140-y. eCollection 2023 Oct.
4
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
5
A stochastic context free grammar based framework for analysis of protein sequences.
BMC Bioinformatics. 2009 Oct 8;10:323. doi: 10.1186/1471-2105-10-323.
9
Grammatical inference in bioinformatics.
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1051-62. doi: 10.1109/TPAMI.2005.140.

引用本文的文献

3
Storage Space Allocation Strategy for Digital Data with Message Importance.
Entropy (Basel). 2020 May 25;22(5):591. doi: 10.3390/e22050591.
4
Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.
PLoS One. 2020 May 26;15(5):e0232942. doi: 10.1371/journal.pone.0232942. eCollection 2020.
5
Algorithms designed for compressed-gene-data transformation among gene banks with different references.
BMC Bioinformatics. 2018 Jun 18;19(1):230. doi: 10.1186/s12859-018-2230-2.
6
Sequence Factorization with Multiple References.
PLoS One. 2015 Sep 30;10(9):e0139000. doi: 10.1371/journal.pone.0139000. eCollection 2015.
7
Conditional entropy in variation-adjusted windows detects selection signatures associated with expression quantitative trait loci (eQTLs).
BMC Genomics. 2015;16 Suppl 8(Suppl 8):S8. doi: 10.1186/1471-2164-16-S8-S8. Epub 2015 Jun 18.
8
Information theory applications for biological sequence analysis.
Brief Bioinform. 2014 May;15(3):376-89. doi: 10.1093/bib/bbt068. Epub 2013 Sep 20.
9
Using weighted entropy to rank chemicals in quantitative high-throughput screening experiments.
J Biomol Screen. 2014 Mar;19(3):344-53. doi: 10.1177/1087057113505325. Epub 2013 Sep 20.
10
Adaptive efficient compression of genomes.
Algorithms Mol Biol. 2012 Nov 12;7(1):30. doi: 10.1186/1748-7188-7-30.

本文引用的文献

1
Computational identification of functional RNA homologs in metagenomic data.
RNA Biol. 2013 Jul;10(7):1170-9. doi: 10.4161/rna.25038. Epub 2013 May 20.
2
Use of average mutual information for studying changes in HIV populations.
Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3861-4. doi: 10.1109/IEMBS.2009.5332579.
3
Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method.
Discrete Appl Math. 1996 Dec 1;71(1-3):259-268. doi: 10.1016/S0166-218X(96)00068-6.
4
Information and knowledge in biology: time for reappraisal.
Plant Signal Behav. 2007 Mar;2(2):65-73. doi: 10.4161/psb.2.2.4113.
5
Textual data compression in computational biology: a synopsis.
Bioinformatics. 2009 Jul 1;25(13):1575-86. doi: 10.1093/bioinformatics/btp117. Epub 2009 Feb 27.
6
A formal language-based approach in biology.
Comp Funct Genomics. 2004;5(1):91-4. doi: 10.1002/cfg.364.
7
Grammar-based distance in progressive multiple sequence alignment.
BMC Bioinformatics. 2008 Jul 10;9:306. doi: 10.1186/1471-2105-9-306.
8
Discovery of novel tumor suppressor p53 response elements using information theory.
Nucleic Acids Res. 2008 Jun;36(11):3828-33. doi: 10.1093/nar/gkn189. Epub 2008 May 21.
9
Genetic variation in mother-child acute seroconverter pairs from Zambia.
AIDS. 2008 Apr 23;22(7):817-24. doi: 10.1097/QAD.0b013e3282f486af.
10
The average mutual information profile as a genomic signature.
BMC Bioinformatics. 2008 Jan 25;9:48. doi: 10.1186/1471-2105-9-48.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验