Faison William J, Rostovtsev Alexandre, Castro-Nallar Eduardo, Crandall Keith A, Chumakov Konstantin, Simonyan Vahan, Mazumder Raja
The Department of Biochemistry & Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.
Center for Biologics Evaluation and Research, US Food and Drug Administration, 1451 Rockville Pike, Rockville, MD 20852, USA.
Genomics. 2014 Jul;104(1):1-7. doi: 10.1016/j.ygeno.2014.06.001. Epub 2014 Jun 12.
Next-generation sequencing data can be mapped to a reference genome to identify single-nucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position) and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets.
PhyloSNP is freely available for download at http://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp.
下一代测序数据可以映射到参考基因组以识别单核苷酸多态性/变异(单核苷酸多态性/单核苷酸变异;以下称为单核苷酸多态性)。理论上,可以比较多个样本中的单核苷酸多态性,这些差异可用于创建描绘样本间亲缘关系的系统发育树。然而,在实践中这很困难,因为目前没有独立的工具可以直接将单核苷酸多态性数据作为输入并生成系统发育树。为满足这一需求,创建了PhyloSNP应用程序,它有两种分析方法:1)一种定量方法,创建存在/缺失矩阵,该矩阵可直接用于生成系统发育树,或从压缩的基因组比对(包括单核苷酸多态性位置周围的额外碱基)创建一棵树;2)一种定性方法,根据在特定位置发现的不同碱基的频率对样本进行聚类。这些算法被用于从脊髓灰质炎病毒、伯克霍尔德菌和人类癌症基因组学的下一代测序数据集生成树。
PhyloSNP可在http://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp免费下载。