Teagasc Animal and Bioscience Research Department, Animal & Grassland Research and Innovation Centre, Teagasc, Grange, Dunsany, Co, Meath, Ireland.
BMC Bioinformatics. 2013 Feb 8;14:45. doi: 10.1186/1471-2105-14-45.
Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data.
Here we present SNPdat, a high throughput analysis tool that can provide a comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. Using a dataset of 4,566 SNPs identified in cattle using high-throughput DNA sequencing we demonstrate the annotations performed and the statistics that can be generated by SNPdat.
SNPdat provides users with a simple tool for annotation of genomes that are either not supported by other tools or have a small number of annotated SNPs available. SNPdat can also be used to analyse datasets from organisms which are densely sampled for SNPs. As a command line tool it can easily be incorporated into existing SNP discovery pipelines and fills a niche for analyses involving non-model organisms that are not supported by many available SNP annotation tools. SNPdat will be of great interest to scientists involved in SNP discovery and analysis projects, particularly those with limited bioinformatics experience.
单核苷酸多态性(SNPs)是脊椎动物和无脊椎动物中最丰富的遗传变异。SNP 的发现已经成为一个高度自动化、稳健且相对廉价的过程,允许为模型和非模型生物鉴定数千种突变。注释大量 SNP 可能是一个困难和复杂的过程。许多可用的工具都是针对 SNP 密集采样的生物(如人类)进行优化的。目前,很少有针对非特定物种或支持非模型生物数据的工具。
我们在这里提出了 SNPdat,这是一种高通量分析工具,可以为具有草案序列和注释的任何生物的新的和已知的 SNP 提供全面的注释。我们使用在牛中使用高通量 DNA 测序鉴定的 4566 个 SNP 的数据集演示了 SNPdat 执行的注释和可以生成的统计信息。
SNPdat 为用户提供了一种简单的基因组注释工具,适用于其他工具不支持或具有少量可用注释 SNP 的基因组。SNPdat 还可用于分析 SNP 密集采样的生物的数据集。作为一个命令行工具,它可以很容易地集成到现有的 SNP 发现管道中,并填补了许多可用的 SNP 注释工具不支持的非模型生物分析的空白。SNPdat 将引起参与 SNP 发现和分析项目的科学家的极大兴趣,特别是那些具有有限生物信息学经验的科学家。