Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Bioinformatics. 2009 Aug 15;25(16):2074-5. doi: 10.1093/bioinformatics/btp344. Epub 2009 Jun 3.
Here, we present a method for estimating the frequencies of SNP alleles present within pooled samples of DNA using high-throughput short-read sequencing. The method was tested on real data from six strains of the highly monomorphic pathogen Salmonella Paratyphi A, sequenced individually and in a pool. A variety of read mapping and quality-weighting procedures were tested to determine the optimal parameters, which afforded > or =80% sensitivity of SNP detection and strong correlation with true SNP frequency at poolwide read depth of 40x, declining only slightly at read depths 20-40x.
The method was implemented in Perl and relies on the opensource software Maq for read mapping and SNP calling. The Perl script is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/pools/.
在这里,我们提出了一种使用高通量短读测序对混合 DNA 样本中 SNP 等位基因频率进行估计的方法。该方法在六种高度单态的病原体副伤寒 A 沙门氏菌的真实数据上进行了测试,这些数据分别进行了测序和混合测序。测试了各种读取映射和质量加权程序,以确定最佳参数,这些参数在 40x 的全池读取深度下提供了 > =80%的 SNP 检测灵敏度和与真实 SNP 频率的强相关性,在 20-40x 的读取深度下仅略有下降。
该方法是用 Perl 实现的,依赖于 Maq 开源软件进行读取映射和 SNP 调用。Perl 脚本可从 ftp://ftp.sanger.ac.uk/pub/pathogens/pools/ 免费获取。