• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

改进的哈夫比特压缩算法 - R的一种应用

Modified HuffBit Compress Algorithm - An Application of R.

作者信息

Habib Nahida, Ahmed Kawsar, Jabin Iffat, Rahman Mohammad Motiur

机构信息

Department of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail 1902, Bangladesh.

Department of Information and Communication Technology (ICT), Mawlana Bhashani Science and Technology University (MBSTU), Tangail, Bangladesh.

出版信息

J Integr Bioinform. 2018 Feb 22;15(3):20170057. doi: 10.1515/jib-2017-0057.

DOI:10.1515/jib-2017-0057
PMID:29470175
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6340127/
Abstract

The databases of genomic sequences are growing at an explicative rate because of the increasing growth of living organisms. Compressing deoxyribonucleic acid (DNA) sequences is a momentous task as the databases are getting closest to its threshold. Various compression algorithms are developed for DNA sequence compression. An efficient DNA compression algorithm that works on both repetitive and non-repetitive sequences known as "HuffBit Compress" is based on the concept of Extended Binary Tree. In this paper, here is proposed and developed a modified version of "HuffBit Compress" algorithm to compress and decompress DNA sequences using the R language which will always give the Best Case of the compression ratio but it uses extra 6 bits to compress than best case of "HuffBit Compress" algorithm and can be named as the "Modified HuffBit Compress Algorithm". The algorithm makes an extended binary tree based on the Huffman Codes and the maximum occurring bases (A, C, G, T). Experimenting with 6 sequences the proposed algorithm gives approximately 16.18 % improvement in compression ration over the "HuffBit Compress" algorithm and 11.12 % improvement in compression ration over the "2-Bits Encoding Method".

摘要

由于生物数量的不断增加,基因组序列数据库正以惊人的速度增长。随着数据库接近其容量极限,压缩脱氧核糖核酸(DNA)序列成为一项重大任务。人们开发了各种DNA序列压缩算法。一种名为“HuffBit Compress”的高效DNA压缩算法,它基于扩展二叉树的概念,对重复和非重复序列均有效。本文提出并开发了“HuffBit Compress”算法的改进版本,使用R语言对DNA序列进行压缩和解压缩,该版本总能给出最佳压缩率,但比“HuffBit Compress”算法的最佳情况多使用6位进行压缩,可称为“改进的HuffBit Compress算法”。该算法基于哈夫曼编码和出现频率最高的碱基(A、C、G、T)构建扩展二叉树。通过对6个序列进行实验,结果表明,与“HuffBit Compress”算法相比,该算法的压缩率提高了约16.18%,与“2位编码方法”相比,压缩率提高了11.12%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/bb2af74f7314/jib-15-20170057-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/267944760385/jib-15-20170057-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/1b7f8775d650/jib-15-20170057-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/5b7b8ba76df9/jib-15-20170057-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/4a241005334d/jib-15-20170057-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/d3275801972c/jib-15-20170057-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/8febe5ecebc0/jib-15-20170057-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/bb2af74f7314/jib-15-20170057-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/267944760385/jib-15-20170057-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/1b7f8775d650/jib-15-20170057-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/5b7b8ba76df9/jib-15-20170057-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/4a241005334d/jib-15-20170057-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/d3275801972c/jib-15-20170057-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/8febe5ecebc0/jib-15-20170057-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ad8/6340127/bb2af74f7314/jib-15-20170057-g007.jpg

相似文献

1
Modified HuffBit Compress Algorithm - An Application of R.改进的哈夫比特压缩算法 - R的一种应用
J Integr Bioinform. 2018 Feb 22;15(3):20170057. doi: 10.1515/jib-2017-0057.
2
ERGC: an efficient referential genome compression algorithm.ERGC:一种高效的参考基因组压缩算法。
Bioinformatics. 2015 Nov 1;31(21):3468-75. doi: 10.1093/bioinformatics/btv399. Epub 2015 Jul 2.
3
Toward a Better Compression for DNA Sequences Using Huffman Encoding.使用哈夫曼编码实现对DNA序列更好的压缩
J Comput Biol. 2017 Apr;24(4):280-288. doi: 10.1089/cmb.2016.0151. Epub 2016 Dec 13.
4
A compression method for DNA.一种 DNA 的压缩方法。
PLoS One. 2020 Nov 25;15(11):e0238220. doi: 10.1371/journal.pone.0238220. eCollection 2020.
5
AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.AFRESh:一种具有随机访问功能的用于压缩读取数据和组装序列的自适应框架。
Bioinformatics. 2017 May 15;33(10):1464-1472. doi: 10.1093/bioinformatics/btx001.
6
DNABIT Compress - Genome compression algorithm.DNABIT压缩 - 基因组压缩算法。
Bioinformation. 2011 Jan 22;5(8):350-60. doi: 10.6026/97320630005350.
7
High-speed and high-ratio referential genome compression.高速高比参照基因组压缩。
Bioinformatics. 2017 Nov 1;33(21):3364-3372. doi: 10.1093/bioinformatics/btx412.
8
Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis.基于哈希的重排序压缩基因组测序reads:算法与分析。
Bioinformatics. 2018 Feb 15;34(4):558-567. doi: 10.1093/bioinformatics/btx639.
9
Advances in high throughput DNA sequence data compression.高通量DNA序列数据压缩技术的进展。
J Bioinform Comput Biol. 2016 Jun;14(3):1630002. doi: 10.1142/S0219720016300021. Epub 2015 Dec 20.
10
Algorithms designed for compressed-gene-data transformation among gene banks with different references.用于在具有不同参照的基因库之间进行压缩基因数据转换的算法。
BMC Bioinformatics. 2018 Jun 18;19(1):230. doi: 10.1186/s12859-018-2230-2.

本文引用的文献

1
DNABIT Compress - Genome compression algorithm.DNABIT压缩 - 基因组压缩算法。
Bioinformation. 2011 Jan 22;5(8):350-60. doi: 10.6026/97320630005350.
2
DNACompress: fast and effective DNA sequence compression.DNACompress:快速有效的DNA序列压缩
Bioinformatics. 2002 Dec;18(12):1696-8. doi: 10.1093/bioinformatics/18.12.1696.
3
PatternHunter: faster and more sensitive homology search.PatternHunter:更快、更灵敏的同源性搜索。
Bioinformatics. 2002 Mar;18(3):440-5. doi: 10.1093/bioinformatics/18.3.440.
4
Trends in computational biology: a summary based on a RECOMB plenary lecture, 1999.计算生物学的发展趋势:基于1999年RECOMB全会演讲的总结
J Comput Biol. 1999 Fall-Winter;6(3-4):459-74. doi: 10.1089/106652799318391.
5
The emerging paradigm and open problems in comparative genomics.比较基因组学中的新兴范式与开放性问题。
Bioinformatics. 1999 Apr;15(4):265-6. doi: 10.1093/bioinformatics/15.4.265.