调整校正器可提高多物种序列数据的准确性并减少计算时间。

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

机构信息

1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

2Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA.

出版信息

Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.

DOI:10.1099/mgen.0.000122

PMID:29114401

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5643015/

Abstract

As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. or ) and one minority member (i.e. human or the endosymbiont Bm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In , at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

摘要

随着测序技术的发展，分析这些序列的工具也取得了类似的进展。然而，对于多物种样本，我们观察到 bwa-mem（Burrows-Wheeler aligner-maximum exact matches）相对于 bwa-aln 在对齐特异性和计算时间方面存在重要且不利的差异。因此，我们试图优化 bwa-mem 以对齐多物种样本的数据，以减少对齐时间并提高对齐的特异性。在所检查的多物种情况下，序列数据有一个主要成员（即或）和一个少数成员（即人类或内共生体 Bm）。将 bwa-mem 的种子长度从默认值增加，可以减少来自主要序列成员的读对数量，这些读对错误地与少数序列成员的参考基因组对齐。将两个源基因组合并到一个单一的参考基因组中，提高了映射的特异性，同时也减少了中央处理器（CPU）时间。在中，在种子长度为 18nt 的情况下，使用 1.7±0.1 CPU 小时，有 24.1%的读对映射到人类基因组，而使用 0.2±0.0 CPU 小时，有 83.6%的读对映射到基因组（总共：107.7%的读对映射；在 1.9±0.1 CPU 小时内）。相比之下，在仅使用 0.7±0.0 CPU 小时的情况下，97.1%的读对映射到一个组合的人类参考。总体而言，结果表明，将所有参考合并到一个单一的参考数据库中，并使用 23nt 的种子长度可以减少计算时间，同时最大限度地提高特异性。在模拟的宏基因组数据集的序列读取中也发现了类似的结果。我们在一个公开的仅人类数据集发现了计算时间的类似改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/c8d860f7c54e/mgen-3-122-g001.jpg

相似文献

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.

Faster single-end alignment generation utilizing multi-thread for BWA.

Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.

A fast read alignment method based on seed-and-vote for next generation sequencing.

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.

Multi-threading the generation of Burrows-Wheeler Alignment.

Genet Mol Res. 2016 May 23;15(2):gmr8650. doi: 10.4238/gmr.15028650.

Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.

Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.

Long read alignment based on maximal exact match seeds.

Bioinformatics. 2012 Sep 15;28(18):i318-i324. doi: 10.1093/bioinformatics/bts414.

Re-alignment of the unmapped reads with base quality score.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.

CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.

Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9.

CLAST: CUDA implemented large-scale alignment search tool.

BMC Bioinformatics. 2014 Dec 11;15(1):406. doi: 10.1186/s12859-014-0406-y.

PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.

Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.

引用本文的文献

Pangenomic and Phenotypic Characterization of Colombian Germplasm Reveals the Genetic Basis of Fruit Quality Traits.

Int J Mol Sci. 2025 Aug 23;26(17):8205. doi: 10.3390/ijms26178205.

Discovery of variation in genes related to agronomic traits by sequencing the genome of Cucurbita pepo varieties.

BMC Genomics. 2025 Apr 3;26(1):335. doi: 10.1186/s12864-025-11370-x.

Discarded sequencing reads uncover natural variation in pest resistance in .

Elife. 2024 Dec 19;13:RP95510. doi: 10.7554/eLife.95510.

A metagenomic approach to demystify the anaerobic digestion black box and achieve higher biogas yield: a review.

Front Microbiol. 2024 Oct 11;15:1437098. doi: 10.3389/fmicb.2024.1437098. eCollection 2024.

Interchromosomal segmental duplication drives translocation and loss of histidine-rich protein 3.

Elife. 2024 Oct 7;13:RP93534. doi: 10.7554/eLife.93534.

SigAlign: an alignment algorithm guided by explicit similarity criteria.

Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.

Characterizing genetic variation on the Z chromosome in Schistosoma japonicum reveals host-parasite co-evolution.

Parasit Vectors. 2024 May 8;17(1):207. doi: 10.1186/s13071-024-06250-4.

Complete genome sequence of BBC32B isolated from human feces sample.

Microbiol Resour Announc. 2023 Nov 16;12(11):e0064523. doi: 10.1128/MRA.00645-23. Epub 2023 Oct 11.

A genome-wide CRISPR screen maps endogenous regulators of PPARG gene expression in bladder cancer.

iScience. 2023 Mar 30;26(5):106525. doi: 10.1016/j.isci.2023.106525. eCollection 2023 May 19.

Recovering High-Quality Host Genomes from Gut Metagenomic Data through Genotype Imputation.

Adv Genet (Hoboken). 2022 May 6;3(3):2100065. doi: 10.1002/ggn2.202100065. eCollection 2022 Sep.

本文引用的文献

Efficient Enrichment of Bacterial mRNA from Host-Bacteria Total RNA Samples.

Sci Rep. 2016 Oct 7;6:34850. doi: 10.1038/srep34850.

Time-resolved dual RNA-seq reveals extensive rewiring of lung epithelial and pneumococcal transcriptomes during early infection.

Genome Biol. 2016 Sep 27;17(1):198. doi: 10.1186/s13059-016-1054-5.

An integrated genomic and transcriptomic survey of mucormycosis-causing fungi.

Nat Commun. 2016 Jul 22;7:12218. doi: 10.1038/ncomms12218.

Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions.

Nature. 2016 Jan 28;529(7587):496-501. doi: 10.1038/nature16547. Epub 2016 Jan 20.

Dual RNA-seq of Nontypeable Haemophilus influenzae and Host Cell Transcriptomes Reveals Novel Insights into Host-Pathogen Cross Talk.

mBio. 2015 Nov 17;6(6):e01765-15. doi: 10.1128/mBio.01765-15.

The Molecular Taxonomy of Primary Prostate Cancer.

Cell. 2015 Nov 5;163(4):1011-25. doi: 10.1016/j.cell.2015.10.025.

Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma.

N Engl J Med. 2016 Jan 14;374(2):135-45. doi: 10.1056/NEJMoa1505917. Epub 2015 Nov 4.

An integrated map of structural variation in 2,504 human genomes.

Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.

A global reference for human genetic variation.

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Pathogen Cell-to-Cell Variability Drives Heterogeneity in Host Immune Responses.

Cell. 2015 Sep 10;162(6):1309-21. doi: 10.1016/j.cell.2015.08.027. Epub 2015 Sep 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

调整校正器可提高多物种序列数据的准确性并减少计算时间。

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献