Giancarlo Raffaele, Scaturro Davide, Utro Filippo
Dipartimento di Matematica ed Applicazioni, Università di Palermo, Palermo, Italy.
Bioinformatics. 2009 Jul 1;25(13):1575-86. doi: 10.1093/bioinformatics/btp117. Epub 2009 Feb 27.
Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks.
The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided.
It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/ approximately raffaele/suppMaterial/compReview/
文本数据压缩以及来自信息论的相关技术,通常被认为与数据通信和存储相关。然而,它们也与分类、数据挖掘和分析密切相关。近年来,人们为将文本数据压缩技术应用于各种计算生物学任务付出了巨大努力,这些任务涵盖从大型数据集的存储和索引到生物网络的比较和逆向工程。
本综述的主要重点是系统介绍生物信息学和计算生物学中使用压缩技术的关键领域。在可能的情况下,还提供了主要思想和技术的统一组织方式。
不言而喻,这里综述的大多数研究成果都为生物信息学社区提供了软件原型。补充材料提供了指向一系列广泛感兴趣应用的软件和基准数据集的指针。除了提供软件参考外,补充材料还简要介绍了与本文相关的一些基本结果和技术。网址为:http://www.math.unipa.it/ approximately raffaele/suppMaterial/compReview/