Cao Changchang, Pan Rongfang, Tan Jun, Sun Xiao, Xiao Pengfeng
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, China.
Mol Genet Genomics. 2017 Oct;292(5):1069-1081. doi: 10.1007/s00438-017-1332-2. Epub 2017 Jun 13.
Identifying single nucleotide polymorphism (SNPs) from pooled samples is critical for many studies and applications. SNPs determined by next-generation sequencing results may suffer from errors in both base calling and read mapping. Taking advantage of dual mononucleotide addition-based pyrosequencing, we present Epds, a method to efficiently identify SNPs from pooled DNA samples. On the basis of only five patterns of non-synchronistic extensions between the wild and mutant sequences using dual mononucleotide addition-based pyrosequencing, we employed an enumerative algorithm to infer the mutant locus and estimate the proportion of mutant sequence. According to the profiles resulting from three runs with distinct dual mononucleotide additions, Epds could recover the mutant bases. Results showed that our method had a false-positive rate of less than 3%. Series of simulations revealed that Epds outperformed the current method (PSM) in many situations. Finally, experiments based on profiles produced by real sequencing proved that our method could be successfully applied for the identification of mutants from pooled samples. The software for implementing this method and the experimental data are available at http://bioinfo.seu.edu.cn/Epds .
从混合样本中识别单核苷酸多态性(SNP)对许多研究和应用至关重要。通过下一代测序结果确定的SNP可能在碱基识别和读段比对中都存在错误。利用基于双单核苷酸添加的焦磷酸测序技术,我们提出了Epds,一种从混合DNA样本中高效识别SNP的方法。基于使用基于双单核苷酸添加的焦磷酸测序技术时野生型和突变型序列之间仅有的五种非同步延伸模式,我们采用了一种枚举算法来推断突变位点并估计突变序列的比例。根据三次不同双单核苷酸添加运行产生的图谱,Epds能够恢复突变碱基。结果表明,我们的方法假阳性率低于3%。一系列模拟显示,在许多情况下Epds的性能优于当前方法(PSM)。最后,基于实际测序产生的图谱进行的实验证明,我们的方法可成功应用于从混合样本中识别突变体。实现该方法的软件和实验数据可在http://bioinfo.seu.edu.cn/Epds获取。