Suppr超能文献

在高度多重化 RAD-seq 方案中鉴定和量化嵌合测序读取。

Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD-seq protocol.

机构信息

Department of Biological and Geographical Sciences, School of Applied Sciences, University of Huddersfield, Huddersfield, UK.

IFM Biology, Linköping University, Linköping, Sweden.

出版信息

Mol Ecol Resour. 2022 Nov;22(8):2860-2870. doi: 10.1111/1755-0998.13661. Epub 2022 Jun 27.

Abstract

Highly multiplexed approaches have become common in genomic studies. They have improved the cost-effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified quaddRAD protocol to analyse the occurrence of index hopping and PCR chimeras in a series of experiments with up to 100 multiplexed samples per sequencing lane (639 samples in total). We created two types of sequencing libraries: four libraries of type A, where PCRs were run on individual samples before multiplexing, and three libraries of type B, where PCRs were run on pooled samples. We used fixed pairs of inner barcodes to identify chimeric reads. Type B libraries show a higher percentage of misassigned reads (1.15%) than type A libraries (0.65%). We also quantify the commonly undetectable chimeric sequences that occur whenever multiplexed groups of samples with different outer barcodes are sequenced together on a single flow cell. Our results suggest that these types of chimeric sequences represent up to 1.56% and 1.29% of reads in type A and B libraries, respectively. We also show that increasing the number of mismatches allowed for barcode rescue to above 2 dramatically increases the number of recovered chimeric reads. We provide recommendations for developing highly multiplexed RAD-seq protocols and analysing the resulting data to minimize the generation of chimeric sequences, allowing their quantification and a finer control on the number of PCR cycles necessary to generate enough input DNA for library preparation.

摘要

高通量方法已在基因组研究中变得普遍。它们通过组合条码接头提高了对数百个人进行基因分型的成本效益。然而,这些策略可能会错误地将读取分配给不正确的样本。在这里,我们使用改良的 quaddRAD 协议分析了在一系列实验中索引跳跃和 PCR 嵌合体的发生情况,这些实验中每个测序通道最多可同时处理 100 个多重样品(总共 639 个样品)。我们创建了两种类型的测序文库:4 个类型 A 的文库,其中 PCR 在多重化之前在单个样品上进行,以及 3 个类型 B 的文库,其中 PCR 在混合样品上进行。我们使用固定的内条码对来识别嵌合读取。类型 B 文库显示出更高比例的错误分配读取(1.15%),而类型 A 文库(0.65%)则较低。我们还定量了在单个流动池上同时对具有不同外条码的多个样品进行测序时通常无法检测到的嵌合序列。我们的结果表明,这些类型的嵌合序列分别代表类型 A 和 B 文库中读取的 1.56%和 1.29%。我们还表明,将条码救援允许的错配数量增加到 2 以上,会大大增加回收的嵌合读取数量。我们提供了开发高通量 RAD-seq 协议和分析产生的数据的建议,以最大程度地减少嵌合序列的产生,允许对其进行定量,并更好地控制生成足够输入 DNA 进行文库制备所需的 PCR 循环数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c292/9796921/6de443f8c8ae/MEN-22-2860-g006.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验