• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MAFCO:一种用于MAF文件的压缩工具。

MAFCO: a compression tool for MAF files.

作者信息

Matos Luís M O, Neves António J R, Pratas Diogo, Pinho Armando J

机构信息

Signal Processing Lab, IEETA/DETI, University of Aveiro, 3810-193 Aveiro, Portugal.

出版信息

PLoS One. 2015 Mar 27;10(3):e0116082. doi: 10.1371/journal.pone.0116082. eCollection 2015.

DOI:10.1371/journal.pone.0116082
PMID:25816229
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4376647/
Abstract

In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.

摘要

在过去十年中,基因组测序成本大幅下降,以至于世界各地的研究人员积累了大量数据以供当前和未来使用。这些基因组数据需要有效存储,因为存储成本的下降速度不如测序成本快。为了克服这个问题,通常会使用最流行的通用压缩工具gzip。然而,这些工具并非专门为压缩此类数据而设计,在旨在尽可能减小数据大小的情况下往往效果不佳。即使对于基因组数据,也有几种压缩算法可用,但专门设计用于处理包含多个物种整个基因组之间比对的全基因组比对的算法却很少。在本文中,我们提出了一种无损压缩工具MAFCO,专门用于压缩MAF(多重比对格式)文件。与gzip相比,根据数据集的不同,该工具的压缩率提高了34%至57%。与一种最近的专用方法相比(该方法与某些数据集不兼容),MAFCO的压缩率提高了约9%。用于多个操作系统的源代码和二进制文件可在以下网址免费获取供非商业使用:http://bioinformatics.ua.pt/software/mafco 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/bd879926c40a/pone.0116082.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/af9fa0dedd9c/pone.0116082.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/39ef45169f2a/pone.0116082.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/0fe5d0c5b983/pone.0116082.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/dff787f324f9/pone.0116082.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/6dce3c9ba0d3/pone.0116082.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/bd879926c40a/pone.0116082.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/af9fa0dedd9c/pone.0116082.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/39ef45169f2a/pone.0116082.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/0fe5d0c5b983/pone.0116082.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/dff787f324f9/pone.0116082.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/6dce3c9ba0d3/pone.0116082.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc30/4376647/bd879926c40a/pone.0116082.g006.jpg

相似文献

1
MAFCO: a compression tool for MAF files.MAFCO:一种用于MAF文件的压缩工具。
PLoS One. 2015 Mar 27;10(3):e0116082. doi: 10.1371/journal.pone.0116082. eCollection 2015.
2
MFCompress: a compression tool for FASTA and multi-FASTA data.MFCompress:FASTA 和多 FASTA 数据的压缩工具。
Bioinformatics. 2014 Jan 1;30(1):117-8. doi: 10.1093/bioinformatics/btt594. Epub 2013 Oct 16.
3
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.
4
DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.DELIMINATE——一种快速高效的基因组序列无损压缩方法:序列分析。
Bioinformatics. 2012 Oct 1;28(19):2527-9. doi: 10.1093/bioinformatics/bts467. Epub 2012 Jul 25.
5
GDC 2: Compression of large collections of genomes.基因组数据压缩2:大型基因组集合的压缩
Sci Rep. 2015 Jun 25;5:11565. doi: 10.1038/srep11565.
6
KungFQ: a simple and powerful approach to compress fastq files.KungFQ:一种简单而强大的压缩 fastq 文件的方法。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1837-42. doi: 10.1109/TCBB.2012.123.
7
CSAM: Compressed SAM format.CSAM:压缩 SAM 格式。
Bioinformatics. 2016 Dec 15;32(24):3709-3716. doi: 10.1093/bioinformatics/btw543. Epub 2016 Aug 18.
8
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
9
CoGI: Towards Compressing Genomes as an Image.CoGI:迈向将基因组压缩为图像
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1275-85. doi: 10.1109/TCBB.2015.2430331.
10
LFQC: a lossless compression algorithm for FASTQ files.LFQC:一种用于FASTQ文件的无损压缩算法。
Bioinformatics. 2015 Oct 15;31(20):3276-81. doi: 10.1093/bioinformatics/btv384. Epub 2015 Jun 20.

本文引用的文献

1
FRESCO: Referential compression of highly similar sequences.FRESCO:高度相似序列的参考压缩
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1275-88. doi: 10.1109/tcbb.2013.122.
2
Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.高通量测序技术时代的压缩生物序列分析与存档
Brief Bioinform. 2014 May;15(3):390-406. doi: 10.1093/bib/bbt088. Epub 2013 Dec 17.
3
Data compression for sequencing data.测序数据的数据压缩
Algorithms Mol Biol. 2013 Nov 18;8(1):25. doi: 10.1186/1748-7188-8-25.
4
MFCompress: a compression tool for FASTA and multi-FASTA data.MFCompress:FASTA 和多 FASTA 数据的压缩工具。
Bioinformatics. 2014 Jan 1;30(1):117-8. doi: 10.1093/bioinformatics/btt594. Epub 2013 Oct 16.
5
Compression of FASTQ and SAM format sequencing data.FASTQ 和 SAM 格式测序数据的压缩。
PLoS One. 2013;8(3):e59190. doi: 10.1371/journal.pone.0059190. Epub 2013 Mar 22.
6
NGC: lossless and lossy compression of aligned high-throughput sequencing data.NGC:对齐高通量测序数据的无损和有损压缩。
Nucleic Acids Res. 2013 Jan 7;41(1):e27. doi: 10.1093/nar/gks939. Epub 2012 Oct 12.
7
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.
8
Compression of next-generation sequencing reads aided by highly efficient de novo assembly.高通量测序reads 的压缩辅助高效从头组装。
Nucleic Acids Res. 2012 Dec;40(22):e171. doi: 10.1093/nar/gks754. Epub 2012 Aug 16.
9
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.利用布劳尔-惠勒变换对基因组序列数据库进行大规模压缩。
Bioinformatics. 2012 Jun 1;28(11):1415-9. doi: 10.1093/bioinformatics/bts173. Epub 2012 May 3.
10
Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics.Biogem:一种基于工具的有效方法,可用于扩大生物信息学中开源软件开发的规模。
Bioinformatics. 2012 Apr 1;28(7):1035-7. doi: 10.1093/bioinformatics/bts080. Epub 2012 Feb 12.