• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

调整校正器可提高多物种序列数据的准确性并减少计算时间。

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

机构信息

1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

2Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA.

出版信息

Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.

DOI:10.1099/mgen.0.000122
PMID:29114401
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5643015/
Abstract

As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. or ) and one minority member (i.e. human or the endosymbiont Bm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In , at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

摘要

随着测序技术的发展,分析这些序列的工具也取得了类似的进展。然而,对于多物种样本,我们观察到 bwa-mem(Burrows-Wheeler aligner-maximum exact matches)相对于 bwa-aln 在对齐特异性和计算时间方面存在重要且不利的差异。因此,我们试图优化 bwa-mem 以对齐多物种样本的数据,以减少对齐时间并提高对齐的特异性。在所检查的多物种情况下,序列数据有一个主要成员(即 或 )和一个少数成员(即人类或内共生体 Bm)。将 bwa-mem 的种子长度从默认值增加,可以减少来自主要序列成员的读对数量,这些读对错误地与少数序列成员的参考基因组对齐。将两个源基因组合并到一个单一的参考基因组中,提高了映射的特异性,同时也减少了中央处理器(CPU)时间。在 中,在种子长度为 18nt 的情况下,使用 1.7±0.1 CPU 小时,有 24.1%的读对映射到人类基因组,而使用 0.2±0.0 CPU 小时,有 83.6%的读对映射到 基因组(总共:107.7%的读对映射;在 1.9±0.1 CPU 小时内)。相比之下,在仅使用 0.7±0.0 CPU 小时的情况下,97.1%的读对映射到一个组合的人类参考。总体而言,结果表明,将所有参考合并到一个单一的参考数据库中,并使用 23nt 的种子长度可以减少计算时间,同时最大限度地提高特异性。在模拟的宏基因组数据集的序列读取中也发现了类似的结果。我们在一个公开的仅人类数据集发现了计算时间的类似改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/826c7595d535/mgen-3-122-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/c8d860f7c54e/mgen-3-122-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/1d12cc66d814/mgen-3-122-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/498ed68c9d4c/mgen-3-122-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/826c7595d535/mgen-3-122-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/c8d860f7c54e/mgen-3-122-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/1d12cc66d814/mgen-3-122-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/498ed68c9d4c/mgen-3-122-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f985/5643015/826c7595d535/mgen-3-122-g004.jpg

相似文献

1
Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.调整校正器可提高多物种序列数据的准确性并减少计算时间。
Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.
2
Faster single-end alignment generation utilizing multi-thread for BWA.利用多线程实现更快的BWA单端比对生成。
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.
3
A fast read alignment method based on seed-and-vote for next generation sequencing.一种基于种子与投票的用于下一代测序的快速读段比对方法。
BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):466. doi: 10.1186/s12859-016-1329-6.
4
Multi-threading the generation of Burrows-Wheeler Alignment.多线程生成布罗-惠勒比对。
Genet Mol Res. 2016 May 23;15(2):gmr8650. doi: 10.4238/gmr.15028650.
5
Evaluation and assessment of read-mapping by multiple next-generation sequencing aligners based on genome-wide characteristics.基于全基因组特征,对多种新一代测序比对器的读段比对进行评估。
Genomics. 2017 Jul;109(3-4):186-191. doi: 10.1016/j.ygeno.2017.03.001. Epub 2017 Mar 9.
6
Long read alignment based on maximal exact match seeds.基于最大精确匹配种子的长读比对。
Bioinformatics. 2012 Sep 15;28(18):i318-i324. doi: 10.1093/bioinformatics/bts414.
7
Re-alignment of the unmapped reads with base quality score.将未映射的 reads 与碱基质量得分重新比对。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S8. doi: 10.1186/1471-2105-16-S5-S8. Epub 2015 Mar 18.
8
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.CUSHAW:一种基于 Burrows-Wheeler 变换的适用于大型基因组的 CUDA 兼容短读序列比对程序。
Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9.
9
CLAST: CUDA implemented large-scale alignment search tool.CLAST:基于CUDA实现的大规模比对搜索工具。
BMC Bioinformatics. 2014 Dec 11;15(1):406. doi: 10.1186/s12859-014-0406-y.
10
PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead.PipeMEM:一种在 Spark 中使用低开销加速 BWA-MEM 的框架。
Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.

引用本文的文献

1
Pangenomic and Phenotypic Characterization of Colombian Germplasm Reveals the Genetic Basis of Fruit Quality Traits.哥伦比亚种质的泛基因组和表型特征揭示了果实品质性状的遗传基础。
Int J Mol Sci. 2025 Aug 23;26(17):8205. doi: 10.3390/ijms26178205.
2
Discovery of variation in genes related to agronomic traits by sequencing the genome of Cucurbita pepo varieties.通过对西葫芦品种的基因组进行测序发现与农艺性状相关的基因变异。
BMC Genomics. 2025 Apr 3;26(1):335. doi: 10.1186/s12864-025-11370-x.
3
Discarded sequencing reads uncover natural variation in pest resistance in .

本文引用的文献

1
Efficient Enrichment of Bacterial mRNA from Host-Bacteria Total RNA Samples.从宿主-细菌总 RNA 样品中高效富集细菌 mRNA。
Sci Rep. 2016 Oct 7;6:34850. doi: 10.1038/srep34850.
2
Time-resolved dual RNA-seq reveals extensive rewiring of lung epithelial and pneumococcal transcriptomes during early infection.时间分辨双RNA测序揭示了早期感染期间肺上皮细胞和肺炎球菌转录组的广泛重塑。
Genome Biol. 2016 Sep 27;17(1):198. doi: 10.1186/s13059-016-1054-5.
3
An integrated genomic and transcriptomic survey of mucormycosis-causing fungi.粘菌病致病真菌的综合基因组和转录组研究。
丢弃的测序读数揭示了……中害虫抗性的自然变异。 (注:原文中“in”后面缺少具体内容)
Elife. 2024 Dec 19;13:RP95510. doi: 10.7554/eLife.95510.
4
A metagenomic approach to demystify the anaerobic digestion black box and achieve higher biogas yield: a review.一种用于揭开厌氧消化黑箱之谜并实现更高沼气产量的宏基因组学方法:综述
Front Microbiol. 2024 Oct 11;15:1437098. doi: 10.3389/fmicb.2024.1437098. eCollection 2024.
5
Interchromosomal segmental duplication drives translocation and loss of histidine-rich protein 3.染色体间片段重复导致富含组氨酸蛋白 3 的易位和缺失。
Elife. 2024 Oct 7;13:RP93534. doi: 10.7554/eLife.93534.
6
SigAlign: an alignment algorithm guided by explicit similarity criteria.SigAlign:一种基于显式相似性标准的对齐算法。
Nucleic Acids Res. 2024 Aug 27;52(15):8717-8733. doi: 10.1093/nar/gkae607.
7
Characterizing genetic variation on the Z chromosome in Schistosoma japonicum reveals host-parasite co-evolution.描述日本血吸虫 Z 染色体上的遗传变异揭示了宿主-寄生虫的共同进化。
Parasit Vectors. 2024 May 8;17(1):207. doi: 10.1186/s13071-024-06250-4.
8
Complete genome sequence of BBC32B isolated from human feces sample.从人类粪便样本中分离出的BBC32B的全基因组序列。
Microbiol Resour Announc. 2023 Nov 16;12(11):e0064523. doi: 10.1128/MRA.00645-23. Epub 2023 Oct 11.
9
A genome-wide CRISPR screen maps endogenous regulators of PPARG gene expression in bladder cancer.一项全基因组CRISPR筛选绘制了膀胱癌中PPARG基因表达的内源性调节因子图谱。
iScience. 2023 Mar 30;26(5):106525. doi: 10.1016/j.isci.2023.106525. eCollection 2023 May 19.
10
Recovering High-Quality Host Genomes from Gut Metagenomic Data through Genotype Imputation.通过基因型插补从肠道宏基因组数据中恢复高质量宿主基因组
Adv Genet (Hoboken). 2022 May 6;3(3):2100065. doi: 10.1002/ggn2.202100065. eCollection 2022 Sep.
Nat Commun. 2016 Jul 22;7:12218. doi: 10.1038/ncomms12218.
4
Dual RNA-seq unveils noncoding RNA functions in host-pathogen interactions.双重 RNA 测序揭示宿主-病原体相互作用中非编码 RNA 的功能。
Nature. 2016 Jan 28;529(7587):496-501. doi: 10.1038/nature16547. Epub 2016 Jan 20.
5
Dual RNA-seq of Nontypeable Haemophilus influenzae and Host Cell Transcriptomes Reveals Novel Insights into Host-Pathogen Cross Talk.不可分型流感嗜血杆菌与宿主细胞转录组的双重RNA测序揭示了宿主-病原体相互作用的新见解。
mBio. 2015 Nov 17;6(6):e01765-15. doi: 10.1128/mBio.01765-15.
6
The Molecular Taxonomy of Primary Prostate Cancer.原发性前列腺癌的分子分类学
Cell. 2015 Nov 5;163(4):1011-25. doi: 10.1016/j.cell.2015.10.025.
7
Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma.乳头状肾细胞癌的综合分子特征分析
N Engl J Med. 2016 Jan 14;374(2):135-45. doi: 10.1056/NEJMoa1505917. Epub 2015 Nov 4.
8
An integrated map of structural variation in 2,504 human genomes.2504个人类基因组结构变异的整合图谱。
Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.
9
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
10
Pathogen Cell-to-Cell Variability Drives Heterogeneity in Host Immune Responses.病原体细胞间变异性驱动宿主免疫反应的异质性。
Cell. 2015 Sep 10;162(6):1309-21. doi: 10.1016/j.cell.2015.08.027. Epub 2015 Sep 3.