• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChIPWig:一种用于 ChIP-seq 数据的随机访问支持的无损和有损压缩方法。

ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.

机构信息

Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

The Henry Samueli School of Engineering, Center for Pervasive Communications and Computing (CPCC), University of California, Irvine, CA 92697, USA.

出版信息

Bioinformatics. 2018 Mar 15;34(6):911-919. doi: 10.1093/bioinformatics/btx685.

DOI:10.1093/bioinformatics/btx685
PMID:29087447
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860022/
Abstract

MOTIVATION

Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers.

RESULTS

We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers.

AVAILABILITY AND IMPLEMENTATION

The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++.

CONTACT

milenkov@illinois.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

染色质免疫沉淀测序(ChIP-seq)实验成本低廉且耗时短,产生的大量数据集带来了巨大的存储和维护挑战。为了解决由此产生的大数据问题,我们提出了一种专门针对 ChIP-seq Wig 数据的无损和有损压缩框架,称为 ChIPWig。ChIPWig 支持随机访问、汇总统计信息查询,它基于非均匀量化器最优点密度设计的渐近理论。

结果

我们在 ENCODE 联盟生成的 10 个 ChIP-seq 数据集上测试了 ChIPWig 压缩器。平均而言,无损 ChIPWig 将文件大小减少到原始文件的 6%,与 bigWig 相比,压缩率提高了 6 倍。与无损模式相比,有损模式进一步将文件大小减少了 2 倍,而使用专门的 NarrowPeaks 方法进行峰值调用和基序发现几乎没有影响。使用通用计算机,压缩和解压缩的速度约为 0.2 秒/MB。

可用性和实现

源代码和二进制文件可在 https://github.com/vidarmehr/ChIPWig-v2 上免费下载,用 C++实现。

联系人

milenkov@illinois.edu。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.ChIPWig:一种用于 ChIP-seq 数据的随机访问支持的无损和有损压缩方法。
Bioinformatics. 2018 Mar 15;34(6):911-919. doi: 10.1093/bioinformatics/btx685.
2
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
3
mspack: efficient lossless and lossy mass spectrometry data compression.mspack:高效的无损和有损质谱数据压缩。
Bioinformatics. 2021 Nov 5;37(21):3923-3925. doi: 10.1093/bioinformatics/btab636.
4
FaStore: a space-saving solution for raw sequencing data.FaStore:一种节省存储空间的原始测序数据解决方案。
Bioinformatics. 2018 Aug 15;34(16):2748-2756. doi: 10.1093/bioinformatics/bty205.
5
AQUa: an adaptive framework for compression of sequencing quality scores with random access functionality.AQUa:一种具有随机访问功能的测序质量分数自适应压缩框架。
Bioinformatics. 2018 Feb 1;34(3):425-433. doi: 10.1093/bioinformatics/btx607.
6
ScaleQC: a scalable lossy to lossless solution for NGS data compression.ScaleQC:一种用于 NGS 数据压缩的可扩展有损到无损解决方案。
Bioinformatics. 2020 Nov 1;36(17):4551-4559. doi: 10.1093/bioinformatics/btaa543.
7
CALQ: compression of quality values of aligned sequencing data.CALQ:对齐测序数据的质量值压缩。
Bioinformatics. 2018 May 15;34(10):1650-1658. doi: 10.1093/bioinformatics/btx737.
8
RENANO: a REference-based compressor for NANOpore FASTQ files.RENANO:一种基于参考的 Nanopore FASTQ 文件压缩工具。
Bioinformatics. 2021 Dec 11;37(24):4862-4864. doi: 10.1093/bioinformatics/btab437.
9
LCQS: an efficient lossless compression tool of quality scores with random access functionality.LCQS:一种具有随机访问功能的高效无损质量评分压缩工具。
BMC Bioinformatics. 2020 Mar 18;21(1):109. doi: 10.1186/s12859-020-3428-7.
10
Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile.使用链位移谱图对 ChIP-seq 读取分布进行灵敏且稳健的评估。
Bioinformatics. 2018 Jul 15;34(14):2356-2363. doi: 10.1093/bioinformatics/bty137.

引用本文的文献

1
Productive visualization of high-throughput sequencing data using the SeqCode open portable platform.使用 SeqCode 开放便携平台对高通量测序数据进行高效可视化。
Sci Rep. 2021 Oct 1;11(1):19545. doi: 10.1038/s41598-021-98889-7.

本文引用的文献

1
Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation.染色质免疫沉淀测序(ChIP-seq)分析的最新进展:从质量管理到全基因组注释
Brief Bioinform. 2017 Mar 1;18(2):279-290. doi: 10.1093/bib/bbw023.
2
A comprehensive comparison of tools for differential ChIP-seq analysis.用于差异染色质免疫沉淀测序(ChIP-seq)分析的工具的全面比较。
Brief Bioinform. 2016 Nov;17(6):953-966. doi: 10.1093/bib/bbv110. Epub 2016 Jan 13.
3
A Statistical Framework for the Analysis of ChIP-Seq Data.用于ChIP-Seq数据分析的统计框架
J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.
4
Entropy-scaling search of massive biological data.海量生物数据的熵尺度搜索
Cell Syst. 2015 Aug 26;1(2):130-140. doi: 10.1016/j.cels.2015.08.004.
5
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
6
Combinatorial activities of SHORT VEGETATIVE PHASE and FLOWERING LOCUS C define distinct modes of flowering regulation in Arabidopsis.SHORT VEGETATIVE PHASE和FLOWERING LOCUS C的组合活性定义了拟南芥中不同的开花调控模式。
Genome Biol. 2015 Feb 11;16(1):31. doi: 10.1186/s13059-015-0597-1.
7
CWig: compressed representation of Wiggle/BedGraph format.CWig:Wiggle/BedGraph 格式的压缩表示。
Bioinformatics. 2014 Sep 15;30(18):2543-50. doi: 10.1093/bioinformatics/btu330. Epub 2014 May 27.
8
Practical guidelines for the comprehensive analysis of ChIP-seq data.《ChIP-seq 数据综合分析实用指南》
PLoS Comput Biol. 2013;9(11):e1003326. doi: 10.1371/journal.pcbi.1003326. Epub 2013 Nov 14.
9
Cistrome: an integrative platform for transcriptional regulation studies.Cistrome:转录调控研究的综合平台。
Genome Biol. 2011 Aug 22;12(8):R83. doi: 10.1186/gb-2011-12-8-r83.
10
On the representability of complete genomes by multiple competing finite-context (Markov) models.多竞争有限上下文(马尔可夫)模型对完整基因组的表示能力。
PLoS One. 2011;6(6):e21588. doi: 10.1371/journal.pone.0021588. Epub 2011 Jun 30.