Suppr超能文献

所有一些序列布隆树。

AllSome Sequence Bloom Trees.

作者信息

Sun Chen, Harris Robert S, Chikhi Rayan, Medvedev Paul

机构信息

1 Department of Computer Science and Engineering, Pennsylvania State University , University Park, Pennsylvania.

2 Department of Biology, Pennsylvania State University , University Park, Pennsylvania.

出版信息

J Comput Biol. 2018 May;25(5):467-479. doi: 10.1089/cmb.2017.0258. Epub 2018 Apr 5.

Abstract

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%-85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

摘要

下一代测序技术的广泛应用改变了许多数据库的规模和性质,突破了当前索引和搜索方法的界限。一个具体的例子是上传到序列读取存档库(SRA)的包含2652个人类RNA测序实验的数据库。最近,所罗门和金斯福德提出了序列布隆树数据结构,并展示了如何使用它来准确识别可能表达了感兴趣转录本的SRA样本。在本文中,我们提出了一种名为全部分序列布隆树的改进方法。结果表明,我们的新数据结构显著提高了性能,将树构建时间减少了52.7%,查询时间减少了39% - 85%,代价是查询期间内存消耗最多增加3倍。值得注意的是,它可以在不到8小时内查询一批198,074个查询(之前大约需要2天),并在不到11分钟内查询来自一个测序实验的一整套k-mer(约2700万个k-mer)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验