Suppr超能文献

一种用于多路区间集交集的并行算法。

A parallel algorithm for -way interval set intersection.

作者信息

Layer Ryan M, Quinlan Aaron R

机构信息

Department of Human Genetics, University of Utah, Salt Lake City, UT, 84112.

Department of Human Genetics, University of Utah, Salt Lake City, UT, 84112. Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84112.

出版信息

Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):542-551. doi: 10.1109/JPROC.2015.2461494.

Abstract

The comparison of sets of genome intervals (e.g., genes, repeats, ChIP-seq peaks) is essential to genome research, especially as modern sequencing technologies enable ever larger and more complex experiments. Relationships between genomic features are commonly identified by their intersection: that is, if feature sets contain overlapping intervals then it is inferred that they share a common biological function or origin. Using this technique, researchers identify genomic regions that are common among multiple (or unique to individual) datasets. While there have been recent advances in algorithms for pairwise intersections between two sets of genomic intervals, few advances have been made to the intersection of many sets of genomic intervals. Identifying intersections among many interval sets is particularly important when attempting to distill biological insights from the massive, multi-dimensional datasets that are common to modern genome research. For such analyses, speed and efficiency are crucial given the size and sheer number of datasets involved. To solve this problem, we present a novel "slice-then-sweep" algorithm that, given interval sets, efficiently reveals the subset of intervals that are common to all sets. We demonstrate that our algorithm is more efficient in the sequential case and has a vastly higher capacity for parallelization with a 19x speedup over the existing algorithm.

摘要

基因组区间集(例如,基因、重复序列、ChIP-seq峰)的比较对于基因组研究至关重要,特别是在现代测序技术使得实验规模越来越大且越来越复杂的情况下。基因组特征之间的关系通常通过它们的交集来确定:也就是说,如果特征集包含重叠区间,那么就推断它们具有共同的生物学功能或起源。使用这种技术,研究人员可以识别多个数据集共有的(或单个数据集特有的)基因组区域。虽然最近在两组基因组区间的成对交集算法方面取得了进展,但在多组基因组区间的交集方面进展甚微。当试图从现代基因组研究中常见的大规模、多维度数据集中提炼生物学见解时,识别多个区间集之间的交集尤为重要。对于此类分析,鉴于所涉及数据集的规模和数量,速度和效率至关重要。为了解决这个问题,我们提出了一种新颖的“切片然后扫描”算法,该算法在给定区间集的情况下,能够有效地揭示所有集合共有的区间子集。我们证明,我们的算法在顺序情况下更高效,并且具有更高的并行化能力,比现有算法快19倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b5d/6188649/380d41eb489a/nihms854502f1.jpg

相似文献

1
A parallel algorithm for -way interval set intersection.一种用于多路区间集交集的并行算法。
Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):542-551. doi: 10.1109/JPROC.2015.2461494.
8
Operating on Genomic Ranges Using BEDOPS.使用BEDOPS对基因组范围进行操作。
Methods Mol Biol. 2016;1418:267-81. doi: 10.1007/978-1-4939-3578-9_14.

引用本文的文献

本文引用的文献

4
BEDOPS: high-performance genomic feature operations.BEDOPS:高性能基因组特征操作。
Bioinformatics. 2012 Jul 15;28(14):1919-20. doi: 10.1093/bioinformatics/bts277. Epub 2012 May 9.
5
A user's guide to the encyclopedia of DNA elements (ENCODE).DNA 元件百科全书(ENCODE)使用指南
PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19.
8
BEDTools: a flexible suite of utilities for comparing genomic features.BEDTools:一套灵活的基因组特征比较工具套件。
Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28.
9
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验