Suppr超能文献

基于聚类的群体 DNA 序列压缩。

Clustering-Based Compression for Population DNA Sequences.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):208-221. doi: 10.1109/TCBB.2017.2762302. Epub 2017 Oct 12.

Abstract

Due to the advancement of DNA sequencing techniques, the number of sequenced individual genomes has experienced an exponential growth. Thus, effective compression of this kind of sequences is highly desired. In this work, we present a novel compression algorithm called Reference-based Compression algorithm using the concept of Clustering (RCC). The rationale behind RCC is based on the observation about the existence of substructures within the population sequences. To utilize these substructures, k-means clustering is employed to partition sequences into clusters for better compression. A reference sequence is then constructed for each cluster so that sequences in that cluster can be compressed by referring to this reference sequence. The reference sequence of each cluster is also compressed with reference to a sequence which is derived from all the reference sequences. Experiments show that RCC can further reduce the compressed size by up to 91.0 percent when compared with state-of-the-art compression approaches. There is a compromise between compressed size and processing time. The current implementation in Matlab has time complexity in a factor of thousands higher than the existing algorithms implemented in C/C++. Further investigation is required to improve processing time in future.

摘要

由于 DNA 测序技术的进步,测序个体基因组的数量呈指数级增长。因此,非常需要有效地压缩这种序列。在这项工作中,我们提出了一种新的压缩算法,称为基于参考的聚类压缩算法(RCC)。RCC 的基本原理是基于对群体序列中存在子结构的观察。为了利用这些子结构,使用 k-均值聚类将序列划分为簇,以实现更好的压缩。然后为每个簇构建一个参考序列,以便可以通过引用该参考序列来压缩该簇中的序列。还使用源自所有参考序列的序列来压缩每个簇的参考序列。实验表明,与最先进的压缩方法相比,RCC 可以将压缩后的大小进一步减少 91.0%。在压缩大小和处理时间之间存在折衷。目前在 Matlab 中的实现的时间复杂度比用 C/C++ 实现的现有算法高几千倍。需要进一步研究以提高未来的处理时间。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验