Torkamaneh Davoud, Laroche Jérôme, Bastien Maxime, Abed Amina, Belzile François
Département de Phytologie, Université Laval, Quebec City, QC, Canada.
Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada.
BMC Bioinformatics. 2017 Jan 3;18(1):5. doi: 10.1186/s12859-016-1431-9.
Next-generation sequencing (NGS) technologies have accelerated considerably the investigation into the composition of genomes and their functions. Genotyping-by-sequencing (GBS) is a genotyping approach that makes use of NGS to rapidly and economically scan a genome. It has been shown to allow the simultaneous discovery and genotyping of thousands to millions of SNPs across a wide range of species. For most users, the main challenge in GBS is the bioinformatics analysis of the large amount of sequence information derived from sequencing GBS libraries in view of calling alleles at SNP loci. Herein we describe a new GBS bioinformatics pipeline, Fast-GBS, designed to provide highly accurate genotyping, to require modest computing resources and to offer ease of use.
Fast-GBS is built upon standard bioinformatics language and file formats, is capable of handling data from different sequencing platforms, is capable of detecting different kinds of variants (SNPs, MNPs, and Indels). To illustrate its performance, we called variants in three collections of samples (soybean, barley, and potato) that cover a range of different genome sizes, levels of genome complexity, and ploidy. Within these small sets of samples, we called 35 k, 32 k and 38 k SNPs for soybean, barley and potato, respectively. To assess genotype accuracy, we compared these GBS-derived SNP genotypes with independent data sets obtained from whole-genome sequencing or SNP arrays. This analysis yielded estimated accuracies of 98.7, 95.2, and 94% for soybean, barley, and potato, respectively.
We conclude that Fast-GBS provides a highly efficient and reliable tool for calling SNPs from GBS data.
新一代测序(NGS)技术极大地加速了对基因组组成及其功能的研究。简化基因组测序(GBS)是一种利用NGS快速且经济地扫描基因组的基因分型方法。已证明它能够在广泛的物种中同时发现和基因分型数千至数百万个单核苷酸多态性(SNP)。对于大多数用户而言,GBS的主要挑战在于鉴于在SNP位点进行等位基因分型,对从GBS文库测序获得的大量序列信息进行生物信息学分析。在此,我们描述一种新的GBS生物信息学流程Fast-GBS,其旨在提供高度准确的基因分型,所需计算资源适度且易于使用。
Fast-GBS基于标准生物信息学语言和文件格式构建,能够处理来自不同测序平台的数据,能够检测不同类型的变异(SNP、多核苷酸多态性(MNP)和插入缺失(Indel))。为说明其性能,我们在三个样本集合(大豆、大麦和马铃薯)中进行变异分型,这些样本涵盖了不同的基因组大小、基因组复杂程度和倍性水平。在这些少量样本中,我们分别为大豆、大麦和马铃薯鉴定出35k、32k和38k个SNP。为评估基因分型准确性,我们将这些源自GBS的SNP基因分型与从全基因组测序或SNP芯片获得的独立数据集进行比较。该分析得出大豆、大麦和马铃薯的估计准确率分别为98.7%、95.2%和94%。
我们得出结论,Fast-GBS为从GBS数据中鉴定SNP提供了一种高效且可靠的工具。