Suppr超能文献

完整基因组测序读数的快速准确映射。

Fast and accurate mapping of Complete Genomics reads.

作者信息

Lee Donghyuk, Hormozdiari Farhad, Xin Hongyi, Hach Faraz, Mutlu Onur, Alkan Can

机构信息

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA.

Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA.

出版信息

Methods. 2015 Jun;79-80:3-10. doi: 10.1016/j.ymeth.2014.10.012. Epub 2014 Oct 22.

Abstract

Many recent advances in genomics and the expectations of personalized medicine are made possible thanks to power of high throughput sequencing (HTS) in sequencing large collections of human genomes. There are tens of different sequencing technologies currently available, and each HTS platform have different strengths and biases. This diversity both makes it possible to use different technologies to correct for shortcomings; but also requires to develop different algorithms for each platform due to the differences in data types and error models. The first problem to tackle in analyzing HTS data for resequencing applications is the read mapping stage, where many tools have been developed for the most popular HTS methods, but publicly available and open source aligners are still lacking for the Complete Genomics (CG) platform. Unfortunately, Burrows-Wheeler based methods are not practical for CG data due to the gapped nature of the reads generated by this method. Here we provide a sensitive read mapper (sirFAST) for the CG technology based on the seed-and-extend paradigm that can quickly map CG reads to a reference genome. We evaluate the performance and accuracy of sirFAST using both simulated and publicly available real data sets, showing high precision and recall rates.

摘要

由于高通量测序(HTS)技术在对大量人类基因组进行测序方面的强大功能,基因组学领域最近取得了许多进展,个性化医疗的期望也得以实现。目前有数十种不同的测序技术可供使用,每个HTS平台都有不同的优势和偏差。这种多样性既使得利用不同技术来弥补缺点成为可能;但由于数据类型和错误模型的差异,也需要为每个平台开发不同的算法。在分析用于重测序应用的HTS数据时,要解决的第一个问题是读段映射阶段,针对最流行的HTS方法已经开发了许多工具,但对于Complete Genomics(CG)平台,仍然缺乏公开可用的开源比对器。不幸的是,由于这种方法产生的读段具有间隙性质,基于Burrows-Wheeler的方法对于CG数据并不实用。在这里,我们基于种子扩展范式为CG技术提供了一种灵敏的读段映射器(sirFAST),它可以快速将CG读段映射到参考基因组。我们使用模拟数据集和公开可用的真实数据集评估了sirFAST的性能和准确性,结果显示其具有高精度和召回率。

相似文献

1
Fast and accurate mapping of Complete Genomics reads.
Methods. 2015 Jun;79-80:3-10. doi: 10.1016/j.ymeth.2014.10.012. Epub 2014 Oct 22.
2
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.
Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544.
3
Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
4
Ψ-RA: a parallel sparse index for genomic read alignment.
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
6
Multi-threading the generation of Burrows-Wheeler Alignment.
Genet Mol Res. 2016 May 23;15(2):gmr8650. doi: 10.4238/gmr.15028650.
8
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.
Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9.
9
Short read alignment with populations of genomes.
Bioinformatics. 2013 Jul 1;29(13):i361-70. doi: 10.1093/bioinformatics/btt215.
10
Fast and accurate long-read alignment with Burrows-Wheeler transform.
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.

引用本文的文献

1
The impact of single-cell genomics on the field of mycobacterial infection.
Front Microbiol. 2022 Sep 30;13:989464. doi: 10.3389/fmicb.2022.989464. eCollection 2022.

本文引用的文献

1
Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.
Gigascience. 2013 Jul 22;2(1):10. doi: 10.1186/2047-217X-2-10.
2
Accelerating read mapping with FastHASH.
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-14-S1-S13. Epub 2013 Jan 21.
3
An integrated map of genetic variation from 1,092 human genomes.
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
4
Tools for mapping high-throughput sequencing data.
Bioinformatics. 2012 Dec 15;28(24):3169-77. doi: 10.1093/bioinformatics/bts605. Epub 2012 Oct 11.
5
Computational techniques for human genome resequencing using mated gapped reads.
J Comput Biol. 2012 Mar;19(3):279-92. doi: 10.1089/cmb.2011.0201. Epub 2011 Dec 16.
6
Genotype and SNP calling from next-generation sequencing data.
Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.
7
Sensitive and fast mapping of di-base encoded reads.
Bioinformatics. 2011 Jul 15;27(14):1915-21. doi: 10.1093/bioinformatics/btr303. Epub 2011 May 17.
8
Genome structural variation discovery and genotyping.
Nat Rev Genet. 2011 May;12(5):363-76. doi: 10.1038/nrg2958. Epub 2011 Mar 1.
9
Mapping copy number variation by population-scale genome sequencing.
Nature. 2011 Feb 3;470(7332):59-65. doi: 10.1038/nature09708.
10
mrsFAST: a cache-oblivious algorithm for short-read mapping.
Nat Methods. 2010 Aug;7(8):576-7. doi: 10.1038/nmeth0810-576.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验