Suppr超能文献

SPANDx:一种用于大型单倍体全基因组重测序数据集比较分析的基因组学流程。

SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

作者信息

Sarovich Derek S, Price Erin P

机构信息

Global and Tropical Health Division, Menzies School of Health Research, Charles Darwin University, PO Box 41096, Casuarina 0811, NT, Australia.

出版信息

BMC Res Notes. 2014 Sep 8;7:618. doi: 10.1186/1756-0500-7-618.

Abstract

BACKGROUND

Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams.

FINDINGS

We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data.

CONCLUSION

SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

摘要

背景

新一代测序(NGS)如今已成为对几乎任何感兴趣物种进行分子特征分析的常用工具。尽管全球实验室对NGS的使用日益增加,但由于NGS软件的碎片化性质以及许多研究团队缺乏经验丰富的生物信息学家,从头到尾分析全基因组重测序(WGS)数据集仍然并非易事。

研究结果

我们描述了SPANDx(Linux下用于分析NGS数据的协同管道),这是一种用于对包含一到数千个基因组的单倍体WGS数据集进行高通量比较分析的新工具。SPANDx将几个经过充分验证的开源软件包整合到一个工具中,减少了学习和操作单个NGS程序的需求。SPANDx纳入了BWA,用于将原始NGS读数与参考基因组或泛基因组进行比对,随后使用Picard、GATK、SAMtools和SnpEff进行数据过滤、变异检测和注释。还纳入了BEDTools用于确定基因座的存在/缺失(P/A),以便轻松可视化核心基因组和辅助基因组。SPANDx的其他功能包括构建纠错单核苷酸多态性(SNP)和插入缺失矩阵以及P/A矩阵,以实现对遗传变异的用户友好型可视化。使用VCFtools和GATK生成的SNP矩阵可直接导入PAUP*、PHYLIP或RAxML进行下游系统发育分析。SPANDx已开发用于处理来自Illumina、Ion Personal Genome Machine(PGM)和454平台的NGS数据,并且我们证明它在Illumina MiSeq/HiSeq2000和Ion PGM数据上具有可比的性能。

结论

SPANDx是用于全面单倍体WGS分析的一体化工具。SPANDx是开源的,可在以下网址免费获取:http://sourceforge.net/projects/spandx/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0100/4169827/4d23776571d8/13104_2014_3161_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验