National Food Institute, Technical University of Denmark, 2800 Kgs Lyngby, Denmark.
BMC Genomics. 2012;13 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2164-13-S7-S6. Epub 2012 Dec 13.
The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data.
Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree.
The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
全基因组测序(WGS)的进步和成本降低,将很快使这项技术可用于常规传染病流行病学。在流行病学研究中,暴发分离株的多样性非常有限,需要进行广泛的基因组分析来区分和分类分离株。其中一种成功且广泛使用的方法是分析单核苷酸多态性(SNP)。目前,有不同的工具和方法来识别 SNP,包括各种选项和截止值。此外,所有当前的方法都需要生物信息学技能。因此,我们缺乏一种标准且简单的自动工具来从 WGS 数据中确定 SNP 并构建系统发育树。
我们在这里介绍了 snpTree,一个用于在线自动 SNP 分析的服务器。该工具由不同的 SNP 分析套件、perl 和 python 脚本组成。snpTree 可以从 WGS 以及组装的基因组或 contigs 中识别 SNP 并构建系统发育树。WGS 数据以 fastq 格式通过 BWA 与参考基因组对齐,而 fasta 格式的 contigs 通过 Nucmer 处理。SNP 根据参考基因组上的位置进行拼接,并使用 FastTree 和一个 perl 脚本从拼接的 SNP 构建树。在线服务器是通过 HTML、Java 和 python 脚本实现的。该服务器使用四个已发表的细菌 WGS 数据集(霍乱弧菌、CC398 金黄色葡萄球菌、鼠伤寒沙门氏菌和结核分枝杆菌)进行了评估。前三种情况的评估结果对于原始读数和组装的基因组都是一致和一致的。在后一种情况下,原始出版物涉及 SNP 的广泛过滤,这是无法使用 snpTree 重复的。
snpTree 服务器是一种易于使用的选项,用于在流行病学研究中快速标准化和自动 SNP 分析,也适用于具有有限生物信息学经验的用户。该网络服务器可在 http://www.cbs.dtu.dk/services/snpTree-1.0/ 免费访问。