Wang Kaile, Ma Qin, Jiang Lan, Lai Shujuan, Lu Xuemei, Hou Yali, Wu Chung-I, Ruan Jue
Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
University of Chinese Academy of Sciences, Beijing, China.
BMC Genomics. 2016 Mar 9;17:214. doi: 10.1186/s12864-016-2480-1.
NGS (next generation sequencing) has been widely used in studies of biological processes, ranging from microbial evolution to cancer genomics. However, the error rate of NGS (0.1 % ~ 1 %) is still remaining a great challenge for comprehensively investigating the low frequency variations, and the current solution methods have suffered severe amplification bias or low efficiency.
We creatively developed Droplet-CirSeq for relatively efficient, low-bias and ultra-sensitive identification of variations by combining millions of picoliter uniform-sized droplets with Cir-seq. Droplet-CirSeq is entitled with an incredibly low error rate of 3 ~ 5 X 10(-6). To systematically evaluate the performances of amplification uniformity and capability of mutation identification for Droplet-CirSeq, we took the mixtures of two E. coli strains as specific instances to simulate the circumstances of mutations with different frequencies. Compared with Cir-seq, the coefficient of variance of read depth for Droplet-CirSeq was 10 times less (p = 2.6 X 10(-3)), and the identified allele frequency presented more concentrated to the authentic frequency of mixtures (p = 4.8 X 10(-3)), illustrating a significant improvement of amplification bias and accuracy in allele frequency determination. Additionally, Droplet-CirSeq detected 2.5 times genuine SNPs (p < 0.001), achieved a 2.8 times lower false positive rate (p < 0.05) and a 1.5 times lower false negative rate (p < 0.001), in the case of a 3 pg DNA input. Intriguingly, the false positive sites predominantly represented in two types of base substitutions (G- > A, C- > T). Our findings indicated that 30 pg DNA input accommodated in 5 ~ 10 million droplets resulted in maximal detection of authentic mutations compared to 3 pg (p = 1.2 X 10(-8)) and 300 pg input (p = 2.2 X 10(-3)).
We developed a method namely Droplet-CirSeq to significantly improve the amplification bias, which presents obvious superiority over the currently prevalent methods in exploitation of ultra-low frequency mutations. Droplet-CirSeq would be promisingly used in the identification of low frequency mutations initiated from extremely low input DNA, such as DNA of uncultured microorganisms, captured DNA of target region, circulation DNA of plasma et al, and its creative conception of rolling circle amplification in droplets would also be used in other low input DNA amplification fields.
新一代测序(NGS)已广泛应用于从微生物进化到癌症基因组学等生物过程的研究中。然而,NGS的错误率(0.1% ~ 1%)对于全面研究低频变异仍是一个巨大挑战,并且当前的解决方法存在严重的扩增偏差或效率低下的问题。
我们创造性地开发了Droplet-CirSeq,通过将数百万个皮升大小均匀的液滴与Cir-seq相结合,实现对变异的相对高效、低偏差和超灵敏识别。Droplet-CirSeq的错误率低至3 ~ 5×10(-6)。为了系统评估Droplet-CirSeq的扩增均匀性和突变识别能力,我们以两种大肠杆菌菌株的混合物作为具体实例,模拟不同频率突变的情况。与Cir-seq相比,Droplet-CirSeq的读深度方差系数小10倍(p = 2.6×10(-3)),且鉴定出的等位基因频率更集中于混合物的真实频率(p = 4.8×10(-3)),这表明在扩增偏差和等位基因频率测定准确性方面有显著提高。此外,在输入3 pg DNA的情况下,Droplet-CirSeq检测到的真实单核苷酸多态性(SNP)多2.5倍(p < 0.001),假阳性率低2.8倍(p < 0.05),假阴性率低1.5倍(p < 0.001)。有趣的是,假阳性位点主要表现为两种碱基替换类型(G->A,C->T)。我们的研究结果表明,与3 pg(p = 1.2×10(-8))和300 pg输入(p = 2.2×10(-3))相比,5 ~ 1000万个液滴中容纳30 pg DNA输入可实现对真实突变的最大检测。
我们开发了一种名为Droplet-CirSeq的方法,可显著改善扩增偏差,在利用超低频突变方面比目前流行的方法具有明显优势。Droplet-CirSeq有望用于识别源自极低输入DNA的低频突变,如未培养微生物的DNA、目标区域捕获的DNA、血浆循环DNA等,其在液滴中滚环扩增的创新概念也将用于其他低输入DNA扩增领域。