: 从多 FASTA 比对中快速有效地提取 SNP。
: rapid efficient extraction of SNPs from multi-FASTA alignments.
机构信息
1 Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
2 Computing, Engineering and Mathematics, University of Brighton, Moulsecoomb, Brighton, BN2 4GJ, UK.
出版信息
Microb Genom. 2016 Apr 29;2(4):e000056. doi: 10.1099/mgen.0.000056. eCollection 2016 Apr.
Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. is implemented in C and is available under the open source license GNU GPL version 3.
快速降低的基因组测序成本导致用于原核种群研究的样本数量相应增加。从大型全基因组比对中提取单核苷酸多态性(SNP)现在是一项常规任务,但现有的工具在研究规模增加时未能有效地扩展。这些工具速度慢、内存效率低,并且通过非标准程序安装。我们提出了一种使用适度资源从多 FASTA 比对中快速提取 SNP 的方法,并可以以多种格式输出结果,用于下游分析。使用 59MB 的 RAM 和 1 个 CPU 内核,可在 267 秒内从 8.3GB 的对齐文件(1842 个分类单元,22618 个位点)中提取 SNP,这使得在适度的计算机上运行成为可能。它可以通过 Debian 和 Homebrew 包管理器轻松安装,并已在 20 多个操作系统上成功测试。是用 C 语言实现的,并根据 GNU GPL 版本 3 的开源许可证提供。
相似文献
Microb Genom. 2016-4-29
Bioinformatics. 2007-7-1
Bioinformatics. 2007-1-15
BMC Bioinformatics. 2021-8-25
Bioinformatics. 2005-5-15
J Comput Biol. 2011-6
BMC Bioinformatics. 2013-11-19
Bioinformatics. 2005-5-15
引用本文的文献
Antibiotics (Basel). 2025-8-20
Nat Ecol Evol. 2025-8-27
PLoS Negl Trop Dis. 2025-8-22
本文引用的文献
Nature. 2015-10-1
Gigascience. 2015-2-25
Proc Natl Acad Sci U S A. 2014-4-14
Nat Genet. 2014-2-9
Bioinformatics. 2014-1-21
Methods Mol Biol. 2014
Mol Biol Evol. 2013-1-16
Bioinformatics. 2011-11-21
Bioinformatics. 2011-6-7