退化接头序列可提高降低代表性测序数据中 PCR 重复检测的基因型调用准确性。

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy.

机构信息

Ecology and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan.

出版信息

Mol Ecol Resour. 2015 Mar;15(2):329-36. doi: 10.1111/1755-0998.12314. Epub 2014 Sep 5.

DOI:10.1111/1755-0998.12314

PMID:25132578

Abstract

RAD-tag is a powerful tool for high-throughput genotyping. It relies on PCR amplification of the starting material, following enzymatic digestion and sequencing adaptor ligation. Amplification introduces duplicate reads into the data, which arise from the same template molecule and are statistically nonindependent, potentially introducing errors into genotype calling. In shotgun sequencing, data duplicates are removed by filtering reads starting at the same position in the alignment. However, restriction enzymes target specific locations within the genome, causing reads to start in the same place, and making it difficult to estimate the extent of PCR duplication. Here, we introduce a slight change to the Illumina sequencing adaptor chemistry, appending a unique four-base tag to the first index read, which allows duplicate discrimination in aligned data. This approach was validated on the Illumina MiSeq platform, using double-digest libraries of ants (Wasmannia auropunctata) and yeast (Saccharomyces cerevisiae) with known genotypes, producing modest though statistically significant gains in the odds of calling a genotype accurately. More importantly, removing duplicates also corrected for strong sample-to-sample variability of genotype calling accuracy seen in the ant samples. For libraries prepared from low-input degraded museum bird samples (Mixornis gularis), which had low complexity, having been generated from relatively few starting molecules, adaptor tags show that virtually all of the genotypes were called with inflated confidence as a result of PCR duplicates. Quantification of library complexity by adaptor tagging does not significantly increase the difficulty of the overall workflow or its cost, but corrects for differences in quality between samples and permits analysis of low-input material.

摘要

RAD 标签是高通量基因分型的有力工具。它依赖于起始材料的 PCR 扩增，随后进行酶切和测序接头连接。扩增会将重复读取引入数据中，这些重复读取来自同一模板分子，在统计学上是非独立的，可能会导致基因型调用错误。在鸟枪法测序中，通过从比对中相同位置开始过滤读取来去除数据重复。然而，限制酶靶向基因组内的特定位置，导致读取从相同的位置开始，从而难以估计 PCR 重复的程度。在这里，我们对 Illumina 测序接头化学进行了微小的改变，在第一个索引读取的末尾添加了一个独特的四碱基标签，从而可以在对齐数据中区分重复。这种方法在 Illumina MiSeq 平台上进行了验证，使用具有已知基因型的蚂蚁（Wasmannia auropunctata）和酵母（Saccharomyces cerevisiae）的双酶切文库，产生了适度但具有统计学意义的准确呼叫基因型的几率增加。更重要的是，去除重复还纠正了蚂蚁样本中基因型调用准确性的强烈样本间变异性。对于从低输入降解博物馆鸟类样本（Mixornis gularis）制备的文库，由于复杂性低，从相对较少的起始分子生成，接头标签表明，由于 PCR 重复，实际上所有基因型的调用置信度都被夸大了。通过接头标记对文库复杂度进行定量不会显著增加整体工作流程的难度或成本，但可以纠正样品之间的质量差异，并允许对低输入材料进行分析。

相似文献

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy.

Mol Ecol Resour. 2015 Mar;15(2):329-36. doi: 10.1111/1755-0998.12314. Epub 2014 Sep 5.

Detection and removal of PCR duplicates in population genomic ddRAD studies by addition of a degenerate base region (DBR) in sequencing adapters.

Biol Bull. 2014 Oct;227(2):146-60. doi: 10.1086/BBLv227n2p146.

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. doi: 10.1186/s12859-017-1471-9.

RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data.

Mol Ecol Resour. 2016 Sep;16(5):1264-78. doi: 10.1111/1755-0998.12566.

Substantial differences in bias between single-digest and double-digest RAD-seq libraries: A case study.

Mol Ecol Resour. 2018 Mar;18(2):264-280. doi: 10.1111/1755-0998.12734. Epub 2017 Nov 30.

High throughput HLA genotyping using 454 sequencing and the Fluidigm Access Array™ System for simplified amplicon library preparation.

Tissue Antigens. 2013 Mar;81(3):141-9. doi: 10.1111/tan.12071.

SNP discovery and genotyping using restriction-site-associated DNA sequencing in chickens.

Anim Genet. 2015 Apr;46(2):216-9. doi: 10.1111/age.12250. Epub 2015 Jan 15.

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.

Mol Ecol Resour. 2023 Aug;23(6):1299-1318. doi: 10.1111/1755-0998.13800. Epub 2023 Apr 16.

Double-digest RAD sequencing using Ion Proton semiconductor platform (ddRADseq-ion) with nonmodel organisms.

Mol Ecol Resour. 2015 Nov;15(6):1316-29. doi: 10.1111/1755-0998.12406. Epub 2015 Apr 6.

Removing duplicate reads using graphics processing units.

BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):346. doi: 10.1186/s12859-016-1192-5.

引用本文的文献

Species ecology explains the spatial components of genetic diversity in tropical reef fishes.

Proc Biol Sci. 2021 Sep 29;288(1959):20211574. doi: 10.1098/rspb.2021.1574.

Functional innovation promotes diversification of form in the evolution of an ultrafast trap-jaw mechanism in ants.

PLoS Biol. 2021 Mar 2;19(3):e3001031. doi: 10.1371/journal.pbio.3001031. eCollection 2021 Mar.

Intraspecific niche partition without speciation: individual level web polymorphism within a single island spider population.

Proc Biol Sci. 2021 Feb 24;288(1945):20203138. doi: 10.1098/rspb.2020.3138. Epub 2021 Feb 17.

Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets.

Genome Biol Evol. 2020 Jul 1;12(7):1131-1147. doi: 10.1093/gbe/evaa113.

Colonize, radiate, decline: Unraveling the dynamics of island community assembly with Fijian trap-jaw ants.

Evolution. 2020 Jun;74(6):1082-1097. doi: 10.1111/evo.13983. Epub 2020 May 10.

Applying a Linear Amplification Strategy to Recombinase Polymerase Amplification for Uniform DNA Library Amplification.

ACS Omega. 2019 Nov 12;4(22):19953-19958. doi: 10.1021/acsomega.9b02886. eCollection 2019 Nov 26.

Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?

Front Genet. 2019 May 29;10:533. doi: 10.3389/fgene.2019.00533. eCollection 2019.

A comparison of different methods for preserving plant molecular materials and the effect of degraded DNA on ddRAD sequencing.

Plant Divers. 2018 Apr 22;40(3):106-116. doi: 10.1016/j.pld.2018.04.001. eCollection 2018 Jun.

Novel genome and genome-wide SNPs reveal early fragmentation effects in an edge-tolerant songbird population across an urbanized tropical metropolis.

Sci Rep. 2018 Aug 24;8(1):12804. doi: 10.1038/s41598-018-31074-5.

From next-generation resequencing reads to a high-quality variant data set.

Heredity (Edinb). 2017 Feb;118(2):111-124. doi: 10.1038/hdy.2016.102. Epub 2016 Oct 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

退化接头序列可提高降低代表性测序数据中 PCR 重复检测的基因型调用准确性。

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献