• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Oculus:通过流式读取压缩实现更快的序列比对。

Oculus: faster sequence alignment by streaming read compression.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.

出版信息

BMC Bioinformatics. 2012 Nov 13;13:297. doi: 10.1186/1471-2105-13-297.

DOI:10.1186/1471-2105-13-297
PMID:23148484
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3534618/
Abstract

BACKGROUND

Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves.

RESULTS

Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases.

CONCLUSIONS

Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio.

摘要

背景

尽管在比对算法方面取得了重大进展,但核苷酸测序通量的指数级增长有可能超过生物信息分析的速度。如果不能通过进一步改进算法来减轻不断增长的比对成本,那么计算可能会成为基因组分析的瓶颈。通过索引和压缩比对数据库已经获得了很多收益,但许多广泛使用的比对工具都是按顺序处理输入读取,而忽略了读取本身的任何潜在冗余。

结果

在这里,我们提出了 Oculus,这是一个软件包,可以附加到标准比对器上,并通过对输入序列进行流压缩、比对和解压缩来利用读取冗余。这个几乎无损的过程(>99.9%)导致在各种数据集上的比对速度提高了 270%,而只需要少量的内存。我们预计,像 Oculus 这样的流式读取压缩器可能会成为 RNA-Seq 和 ChIP-Seq 比对管道的标准附加组件,并随着通量的增加,在未来可能会成为其他应用的标准组件。

结论

Oculus 有效地压缩了冗余的输入读取,并包装了现有的比对器,以便在比对器运行时间的一小部分内提供几乎相同的 SAM 输出。它包括许多有用的功能,如可调整的性能和保真度选项、与 FASTA 或 FASTQ 文件的兼容性,以及对 SAM 格式的遵守。这个与平台无关的 C++源代码可以在网上免费获得,网址是 http://code.google.com/p/oculus-bio。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/f888c17e2e7d/1471-2105-13-297-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/74ab6fd0b42a/1471-2105-13-297-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/37e0d1c9589a/1471-2105-13-297-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/f08bf7d3c7b3/1471-2105-13-297-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/fdac8f1f4d94/1471-2105-13-297-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/f888c17e2e7d/1471-2105-13-297-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/74ab6fd0b42a/1471-2105-13-297-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/37e0d1c9589a/1471-2105-13-297-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/f08bf7d3c7b3/1471-2105-13-297-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/fdac8f1f4d94/1471-2105-13-297-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/680b/3534618/f888c17e2e7d/1471-2105-13-297-5.jpg

相似文献

1
Oculus: faster sequence alignment by streaming read compression.Oculus:通过流式读取压缩实现更快的序列比对。
BMC Bioinformatics. 2012 Nov 13;13:297. doi: 10.1186/1471-2105-13-297.
2
HSRA: Hadoop-based spliced read aligner for RNA sequencing data.HSRA:基于 Hadoop 的 RNA 测序数据拼接读取比对工具。
PLoS One. 2018 Jul 31;13(7):e0201483. doi: 10.1371/journal.pone.0201483. eCollection 2018.
3
SCALCE: boosting sequence compression algorithms using locally consistent encoding.SCALCE:使用局部一致编码提升序列压缩算法。
Bioinformatics. 2012 Dec 1;28(23):3051-7. doi: 10.1093/bioinformatics/bts593. Epub 2012 Oct 9.
4
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
5
STAR: ultrafast universal RNA-seq aligner.STAR:超快通用 RNA-seq 对齐工具。
Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.
6
smallWig: parallel compression of RNA-seq WIG files.smallWig:RNA序列WIG文件的并行压缩
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
7
A comprehensive evaluation of alignment algorithms in the context of RNA-seq.在 RNA-seq 背景下对比对算法的全面评估。
PLoS One. 2012;7(12):e52403. doi: 10.1371/journal.pone.0052403. Epub 2012 Dec 26.
8
Arioc: GPU-accelerated alignment of short bisulfite-treated reads.Arioc:用于短亚硫酸氢盐处理读取物的 GPU 加速对齐。
Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.
9
FBB: a fast Bayesian-bound tool to calibrate RNA-seq aligners.FBB:一种快速贝叶斯约束工具,用于校准 RNA-seq 比对器。
Bioinformatics. 2017 Jan 15;33(2):210-218. doi: 10.1093/bioinformatics/btw608. Epub 2016 Sep 23.
10
RNA-Seq read alignments with PALMapper.使用PALMapper进行RNA-Seq读段比对。
Curr Protoc Bioinformatics. 2010 Dec;Chapter 11:Unit 11.6. doi: 10.1002/0471250953.bi1106s32.

引用本文的文献

1
PINCER: improved CRISPR/Cas9 screening by efficient cleavage at conserved residues.PINCER:通过在保守残基处的有效切割提高 CRISPR/Cas9 筛选效率。
Nucleic Acids Res. 2020 Sep 25;48(17):9462-9477. doi: 10.1093/nar/gkaa645.
2
Compressive mapping for next-generation sequencing.用于下一代测序的压缩映射
Nat Biotechnol. 2016 Apr;34(4):374-6. doi: 10.1038/nbt.3511.
3
The number of reduced alignments between two DNA sequences.两条 DNA 序列之间减少的比对数量。

本文引用的文献

1
Fulcrum: condensing redundant reads from high-throughput sequencing studies.Fulcrum:从高通量测序研究中浓缩冗余的读取。
Bioinformatics. 2012 May 15;28(10):1324-7. doi: 10.1093/bioinformatics/bts123. Epub 2012 Mar 13.
2
SEAL: a distributed short read mapping and duplicate removal tool.SEAL:一种分布式短读映射和去重工具。
Bioinformatics. 2011 Aug 1;27(15):2159-60. doi: 10.1093/bioinformatics/btr325. Epub 2011 Jun 22.
3
Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.
BMC Bioinformatics. 2014 Apr 1;15:94. doi: 10.1186/1471-2105-15-94.
RNA-Seq 定量转录表达谱中精度的特征描述和改进。
Bioinformatics. 2011 Jul 1;27(13):i383-91. doi: 10.1093/bioinformatics/btr247.
4
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.CloudAligner:一种基于MapReduce的快速且功能齐全的序列映射工具。
BMC Res Notes. 2011 Jun 6;4:171. doi: 10.1186/1756-0500-4-171.
5
Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing.通过深度测序对乳腺癌细胞中的基因表达、CpG 岛甲基化和基因拷贝数进行综合分析。
PLoS One. 2011 Feb 25;6(2):e17490. doi: 10.1371/journal.pone.0017490.
6
Human genome 10th anniversary. Will computers crash genomics?人类基因组计划十周年。计算机会使基因组学崩溃吗?
Science. 2011 Feb 11;331(6018):666-8. doi: 10.1126/science.331.6018.666.
7
SlideSort: all pairs similarity search for short reads.SlideSort:用于短读长的所有对相似度搜索。
Bioinformatics. 2011 Feb 15;27(4):464-70. doi: 10.1093/bioinformatics/btq677. Epub 2010 Dec 9.
8
mrsFAST: a cache-oblivious algorithm for short-read mapping.mrsFAST:一种用于短读段映射的缓存无关算法。
Nat Methods. 2010 Aug;7(8):576-7. doi: 10.1038/nmeth0810-576.
9
RazerS--fast read mapping with sensitivity control.RazerS——具有灵敏度控制的快速读取映射。
Genome Res. 2009 Sep;19(9):1646-54. doi: 10.1101/gr.088823.108. Epub 2009 Jul 10.
10
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.