• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GReEn:一种用于高效压缩基因组重测序数据的工具。

GReEn: a tool for efficient compression of genome resequencing data.

机构信息

Signal Processing Lab, IEETA/DETI, University of Aveiro, 3810-193 Aveiro, Portugal.

出版信息

Nucleic Acids Res. 2012 Feb;40(4):e27. doi: 10.1093/nar/gkr1124. Epub 2011 Dec 1.

DOI:10.1093/nar/gkr1124
PMID:22139935
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287168/
Abstract

Research in the genomic sciences is confronted with the volume of sequencing and resequencing data increasing at a higher pace than that of data storage and communication resources, shifting a significant part of research budgets from the sequencing component of a project to the computational one. Hence, being able to efficiently store sequencing and resequencing data is a problem of paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), a tool for compressing genome resequencing data using a reference genome sequence. It overcomes some drawbacks of the recently proposed tool GRS, namely, the possibility of compressing sequences that cannot be handled by GRS, faster running times and compression gains of over 100-fold for some sequences. This tool is freely available for non-commercial use at ftp://ftp.ieeta.pt/~ap/codecs/GReEn1.tar.gz.

摘要

基因组科学研究面临的问题是,测序和重测序数据的增长速度高于数据存储和通信资源的增长速度,这使得项目的测序部分的研究预算的很大一部分转移到了计算部分。因此,能够有效地存储测序和重测序数据是一个至关重要的问题。在本文中,我们描述了 GReEn(基因组重测序编码),这是一种使用参考基因组序列压缩基因组重测序数据的工具。它克服了最近提出的工具 GRS 的一些缺点,即能够压缩 GRS 无法处理的序列,运行时间更快,并且对于某些序列,压缩增益超过 100 倍。这个工具是免费的,可在非商业用途使用,可在 ftp://ftp.ieeta.pt/~ap/codecs/GReEn1.tar.gz 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea7f/3287168/fca0faff3bd1/gkr1124f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea7f/3287168/cac92111253a/gkr1124f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea7f/3287168/fca0faff3bd1/gkr1124f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea7f/3287168/cac92111253a/gkr1124f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea7f/3287168/fca0faff3bd1/gkr1124f2.jpg

相似文献

1
GReEn: a tool for efficient compression of genome resequencing data.GReEn:一种用于高效压缩基因组重测序数据的工具。
Nucleic Acids Res. 2012 Feb;40(4):e27. doi: 10.1093/nar/gkr1124. Epub 2011 Dec 1.
2
A novel compression tool for efficient storage of genome resequencing data.一种用于高效存储基因组重测序数据的新型压缩工具。
Nucleic Acids Res. 2011 Apr;39(7):e45. doi: 10.1093/nar/gkr009. Epub 2011 Jan 25.
3
Efficient storage of high throughput DNA sequencing data using reference-based compression.利用基于参考的压缩技术高效存储高通量 DNA 测序数据。
Genome Res. 2011 May;21(5):734-40. doi: 10.1101/gr.114819.110. Epub 2011 Jan 18.
4
Compressing resequencing data with GReEn.
Methods Mol Biol. 2013;1038:27-37. doi: 10.1007/978-1-62703-514-9_2.
5
ERGC: an efficient referential genome compression algorithm.ERGC:一种高效的参考基因组压缩算法。
Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399. Epub 2015 Jul 2.
6
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.
7
AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.AFRESh:一种具有随机访问功能的用于压缩读取数据和组装序列的自适应框架。
Bioinformatics. 2017 May 15;33(10):1464-1472. doi: 10.1093/bioinformatics/btx001.
8
Genome compression: a novel approach for large collections.基因组压缩:一种用于大型数据集的新方法。
Bioinformatics. 2013 Oct 15;29(20):2572-8. doi: 10.1093/bioinformatics/btt460. Epub 2013 Aug 21.
9
Modified HuffBit Compress Algorithm - An Application of R.改进的哈夫比特压缩算法 - R的一种应用
J Integr Bioinform. 2018 Feb 22;15(3):20170057. doi: 10.1515/jib-2017-0057.
10
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.

引用本文的文献

1
Generating 2D Barcode for DNA Barcode Sequences.生成 DNA 条码序列的 2D 条码。
Methods Mol Biol. 2024;2744:239-246. doi: 10.1007/978-1-0716-3581-0_15.
2
DDQR (dynamic DNA QR coding): An efficient algorithm to represent DNA barcode sequences.DDQR(动态 DNA QR 编码):一种高效的 DNA 条码序列表示算法。
PLoS One. 2023 Jan 17;18(1):e0279994. doi: 10.1371/journal.pone.0279994. eCollection 2023.
3
A Hybrid Data-Differencing and Compression Algorithm for the Automotive Industry.一种用于汽车行业的混合数据差分与压缩算法。

本文引用的文献

1
On the representability of complete genomes by multiple competing finite-context (Markov) models.多竞争有限上下文(马尔可夫)模型对完整基因组的表示能力。
PLoS One. 2011;6(6):e21588. doi: 10.1371/journal.pone.0021588. Epub 2011 Jun 30.
2
Compressing genomic sequence fragments using SlimGene.使用SlimGene压缩基因组序列片段。
J Comput Biol. 2011 Mar;18(3):401-13. doi: 10.1089/cmb.2010.0253.
3
Initial impact of the sequencing of the human genome.人类基因组测序的初步影响。
Entropy (Basel). 2022 Apr 19;24(5):574. doi: 10.3390/e24050574.
4
CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments.查包算法:生物分子序列的可能性和分层参考表示及其在多重序列比对压缩中的应用。
PLoS One. 2022 Apr 18;17(4):e0265360. doi: 10.1371/journal.pone.0265360. eCollection 2022.
5
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes.基于压缩的度量方法在灵长类基因组进化中的应用比较
Entropy (Basel). 2018 May 23;20(6):393. doi: 10.3390/e20060393.
6
Efficient DNA sequence compression with neural networks.神经网络高效 DNA 序列压缩。
Gigascience. 2020 Nov 11;9(11). doi: 10.1093/gigascience/giaa119.
7
Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review.用于组装基因组的垂直无损基因组数据压缩工具:系统文献回顾。
PLoS One. 2020 May 26;15(5):e0232942. doi: 10.1371/journal.pone.0232942. eCollection 2020.
8
HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data.HRCM:一种用于基因组大数据的高效混合参考压缩方法。
Biomed Res Int. 2019 Nov 16;2019:3108950. doi: 10.1155/2019/3108950. eCollection 2019.
9
Tackling the Challenges of FASTQ Referential Compression.应对FASTQ参考压缩的挑战。
Bioinform Biol Insights. 2019 Feb 14;13:1177932218821373. doi: 10.1177/1177932218821373. eCollection 2019.
10
TRCMGene: A two-step referential compression method for the efficient storage of genetic data.TRCMGene:一种两步参考压缩方法,用于高效存储遗传数据。
PLoS One. 2018 Nov 5;13(11):e0206521. doi: 10.1371/journal.pone.0206521. eCollection 2018.
Nature. 2011 Feb 10;470(7333):187-97. doi: 10.1038/nature09792.
4
A novel compression tool for efficient storage of genome resequencing data.一种用于高效存储基因组重测序数据的新型压缩工具。
Nucleic Acids Res. 2011 Apr;39(7):e45. doi: 10.1093/nar/gkr009. Epub 2011 Jan 25.
5
Compression of DNA sequence reads in FASTQ format.FASTQ 格式下 DNA 序列读取的压缩。
Bioinformatics. 2011 Mar 15;27(6):860-2. doi: 10.1093/bioinformatics/btr014. Epub 2011 Jan 19.
6
Efficient storage of high throughput DNA sequencing data using reference-based compression.利用基于参考的压缩技术高效存储高通量 DNA 测序数据。
Genome Res. 2011 May;21(5):734-40. doi: 10.1101/gr.114819.110. Epub 2011 Jan 18.
7
G-SQZ: compact encoding of genomic sequence and quality data.G-SQZ:基因组序列和质量数据的紧凑编码。
Bioinformatics. 2010 Sep 1;26(17):2192-4. doi: 10.1093/bioinformatics/btq346. Epub 2010 Jul 6.
8
Multiple personal genomes await.多个个人基因组即将出现。
Nature. 2010 Apr 1;464(7289):676-7. doi: 10.1038/464676a.
9
The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group.首个韩国人基因组序列及分析:针对一个社会族群的全基因组测序
Genome Res. 2009 Sep;19(9):1622-9. doi: 10.1101/gr.092197.109. Epub 2009 May 26.
10
Data structures and compression algorithms for genomic sequence data.用于基因组序列数据的数据结构和压缩算法。
Bioinformatics. 2009 Jul 15;25(14):1731-8. doi: 10.1093/bioinformatics/btp319. Epub 2009 May 15.