Ecology and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan.
Mol Ecol Resour. 2015 Mar;15(2):329-36. doi: 10.1111/1755-0998.12314. Epub 2014 Sep 5.
RAD-tag is a powerful tool for high-throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four-base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double-digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample-to-sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low-input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low-input material.
RAD 标签是高通量基因分型的有力工具。它依赖于起始材料的 PCR 扩增,随后进行酶切和测序接头连接。扩增会将重复读取引入数据中,这些重复读取来自同一模板分子,在统计学上是非独立的,可能会导致基因型调用错误。在鸟枪法测序中,通过从比对中相同位置开始过滤读取来去除数据重复。然而,限制酶靶向基因组内的特定位置,导致读取从相同的位置开始,从而难以估计 PCR 重复的程度。在这里,我们对 Illumina 测序接头化学进行了微小的改变,在第一个索引读取的末尾添加了一个独特的四碱基标签,从而可以在对齐数据中区分重复。这种方法在 Illumina MiSeq 平台上进行了验证,使用具有已知基因型的蚂蚁(Wasmannia auropunctata)和酵母(Saccharomyces cerevisiae)的双酶切文库,产生了适度但具有统计学意义的准确呼叫基因型的几率增加。更重要的是,去除重复还纠正了蚂蚁样本中基因型调用准确性的强烈样本间变异性。对于从低输入降解博物馆鸟类样本(Mixornis gularis)制备的文库,由于复杂性低,从相对较少的起始分子生成,接头标签表明,由于 PCR 重复,实际上所有基因型的调用置信度都被夸大了。通过接头标记对文库复杂度进行定量不会显著增加整体工作流程的难度或成本,但可以纠正样品之间的质量差异,并允许对低输入材料进行分析。