Department of Integrative Biology, Department of Statistics, University of California, Berkeley, CA 94720, USA and Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark.
Bioinformatics. 2014 May 15;30(10):1486-7. doi: 10.1093/bioinformatics/btu041. Epub 2014 Jan 23.
Next-generation sequencing technologies produce short reads that are either de novo assembled or mapped to a reference genome. Genotypes and/or single-nucleotide polymorphisms are then determined from the read composition at each site, which become the basis for many downstream analyses. However, for low sequencing depths, e.g. , there is considerable statistical uncertainty in the assignment of genotypes because of random sampling of homologous base pairs in heterozygotes and sequencing or alignment errors. Recently, several probabilistic methods have been proposed to account for this uncertainty and make accurate inferences from low quality and/or coverage sequencing data. We present ngsTools, a collection of programs to perform population genetics analyses from next-generation sequencing data. The methods implemented in these programs do not rely on single-nucleotide polymorphism or genotype calling and are particularly suitable for low sequencing depth data.
Programs included in ngsTools are implemented in C/C++ and are freely available for noncommercial use at https://github.com/mfumagalli/ngsTools.
Supplementary materials are available at Bioinformatics online.
新一代测序技术会产生短读段,这些短读段可以从头组装,也可以映射到参考基因组上。然后,根据每个位置的读取组成来确定基因型和/或单核苷酸多态性,这些成为许多下游分析的基础。然而,对于低测序深度,例如,由于杂合子中同源碱基对的随机抽样以及测序或比对错误,基因型的分配存在相当大的统计不确定性。最近,已经提出了几种概率方法来考虑这种不确定性,并从低质量和/或覆盖测序数据中进行准确推断。我们提出了 ngsTools,这是一组用于从下一代测序数据进行群体遗传学分析的程序。这些程序中实现的方法不依赖于单核苷酸多态性或基因型调用,特别适用于低测序深度数据。
ngsTools 中包含的程序是用 C/C++编写的,可在 https://github.com/mfumagalli/ngsTools 上免费用于非商业用途。
补充材料可在 Bioinformatics 在线获得。