快速准确的重测序读对齐。

Fast and accurate read alignment for resequencing.

机构信息

Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

出版信息

Bioinformatics. 2012 Sep 15;28(18):2366-73. doi: 10.1093/bioinformatics/bts450. Epub 2012 Jul 18.

DOI:10.1093/bioinformatics/bts450

PMID:22811546

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3436849/

Abstract

MOTIVATION

Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers.

RESULTS

We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data.

AVAILABILITY

Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto

CONTACT

whwong@stanford.edu.

摘要

动机

下一代测序分析在实验室和临床环境中都已成为一项重要任务。在大多数测序工作流程（如重测序）中，一个关键步骤是将基因组读取与参考基因组进行比对。对于研究人员来说，准确比对具有较大插入/缺失（indels）的读取是一项具有挑战性的计算任务。

结果

我们引入了 SeqAlto 作为一种新的读取对齐算法。对于长度等于或大于 100 bp 的读取，SeqAlto 的速度比现有算法快 10 倍，同时保持了高精度和对齐具有较大（高达 50 bp）插入/缺失的读取的能力。这种效率的提高在未来测序数据分析中尤为重要，因为读取数量接近数十亿。此外，SeqAlto 在对齐人类基因组时使用的内存少于 8GB。我们使用真实数据和模拟数据对 SeqAlto 进行了基准测试，并与几个现有工具进行了比较。

可用性

可在 http://www.stanford.edu/group/wonglab/seqalto 上免费获取适用于学术用途的 Linux 和 Mac OS X 二进制文件。

联系方式

whwong@stanford.edu。

相似文献

Fast and accurate read alignment for resequencing.快速准确的重测序读对齐。

Bioinformatics. 2012 Sep 15;28(18):2366-73. doi: 10.1093/bioinformatics/bts450. Epub 2012 Jul 18.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

SRmapper: a fast and sensitive genome-hashing alignment tool.SRmapper：一种快速且灵敏的基因组哈希比对工具。

Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24.

Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。

Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.

Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA：一种用于基因组读取比对的并行稀疏索引。

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.

ARYANA: Aligning Reads by Yet Another Approach.ARYANA：另一种方法进行读段对齐。

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S12. doi: 10.1186/1471-2105-15-S9-S12. Epub 2014 Sep 10.

BFAST: an alignment tool for large scale genome resequencing.BFAST：用于大规模基因组重测序的比对工具。

PLoS One. 2009 Nov 11;4(11):e7767. doi: 10.1371/journal.pone.0007767.

Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。

Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.

SRPRISM (Single Read Paired Read Indel Substitution Minimizer): an efficient aligner for assemblies with explicit guarantees.SRPRISM（单读配对读插入缺失替换最小化器）：具有明确保证的组装的高效对齐器。

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa023.

引用本文的文献

PVGA: a precise viral genome assembler using an iterative alignment graph.PVGA：一种使用迭代比对图的精确病毒基因组组装器。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf063.

Site-directed mutagenesis of Mycobacterium tuberculosis and functional validation to investigate potential bedaquiline resistance-causing mutations.结核分枝杆菌的定点突变及功能验证，以研究潜在的引起德拉喹啉耐药的突变。

Sci Rep. 2023 Jun 6;13(1):9212. doi: 10.1038/s41598-023-35563-0.

The electronic tree of life (eToL): a net of long probes to characterize the microbiome from RNA-seq data.电子生命之树 (eToL)：从 RNA-seq 数据中描述微生物组的长探针网络。

BMC Microbiol. 2022 Dec 22;22(1):317. doi: 10.1186/s12866-022-02671-2.

The demographic history of house mice (Mus musculus domesticus) in eastern North America.北美东部家鼠（Mus musculus domesticus）的种群历史。

G3 (Bethesda). 2023 Feb 9;13(2). doi: 10.1093/g3journal/jkac332.

From Samples to Germline and Somatic Sequence Variation: A Focus on Next-Generation Sequencing in Melanoma Research.从样本到种系和体细胞序列变异：聚焦黑色素瘤研究中的新一代测序

Life (Basel). 2022 Nov 21;12(11):1939. doi: 10.3390/life12111939.

Dosage sensitivity and exon shuffling shape the landscape of polymorphic duplicates in Drosophila and humans.剂量敏感性和外显子改组塑造了果蝇和人类中多态性重复序列的景观。

Nat Ecol Evol. 2022 Mar;6(3):273-287. doi: 10.1038/s41559-021-01614-w. Epub 2021 Dec 30.

Technology dictates algorithms: recent developments in read alignment.技术决定算法：读段比对的最新进展。

Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7.

Non-antibiotic pharmaceuticals promote the transmission of multidrug resistance plasmids through intra- and intergenera conjugation.非抗生素类药物通过种内和种间接合促进了多重耐药质粒的传播。

ISME J. 2021 Sep;15(9):2493-2508. doi: 10.1038/s41396-021-00945-7. Epub 2021 Mar 10.

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa023.

Detect accessible chromatin using ATAC-sequencing, from principle to applications.使用 ATAC-seq 技术检测可及染色质，从原理到应用。

Hereditas. 2019 Aug 15;156:29. doi: 10.1186/s41065-019-0105-9. eCollection 2019.

本文引用的文献

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

ART: a next-generation sequencing read simulator.ART：一种新一代测序读模拟程序。

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.一种用于从测序数据中进行 SNP 调用、突变发现、关联映射和群体遗传参数估计的统计框架。

Bioinformatics. 2011 Nov 1;27(21):2987-93. doi: 10.1093/bioinformatics/btr509. Epub 2011 Sep 8.

A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。

Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.

SHRiMP2: sensitive yet practical SHort Read Mapping.SHRiMP2：敏感而实用的短读序列比对。

Bioinformatics. 2011 Apr 1;27(7):1011-2. doi: 10.1093/bioinformatics/btr046. Epub 2011 Jan 28.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.Stampy：一种用于 Illumina 序列读取的灵敏快速映射的统计算法。

Genome Res. 2011 Jun;21(6):936-9. doi: 10.1101/gr.111120.110. Epub 2010 Oct 27.

GASSST: global alignment short sequence search tool.GASSST：全局比对短序列搜索工具。

Bioinformatics. 2010 Oct 15;26(20):2534-40. doi: 10.1093/bioinformatics/btq485. Epub 2010 Aug 24.

Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。

Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.

BFAST: an alignment tool for large scale genome resequencing.BFAST：用于大规模基因组重测序的比对工具。

PLoS One. 2009 Nov 11;4(11):e7767. doi: 10.1371/journal.pone.0007767.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验