Boyce Thompson Institute, Ithaca, NY, USA.
Plant Biology Section, Cornell University, Ithaca, NY, USA.
Methods Mol Biol. 2023;2545:189-206. doi: 10.1007/978-1-0716-2561-3_10.
Inferring the true biological sequences from amplicon mixtures remains a difficult bioinformatics problem. The traditional approach is to cluster sequencing reads by similarity thresholds and treat the consensus sequence of each cluster as an "operational taxonomic unit" (OTU). Recently, this approach has been improved by model-based methods that correct PCR and sequencing errors in order to infer "amplicon sequence variants" (ASVs). To date, ASV approaches have been used primarily in metagenomics, but they are also useful for determining homeologs in polyploid organisms. To facilitate the usage of ASV methods among polyploidy researchers, we incorporated ASV inference alongside OTU clustering in PURC v2.0, a major update to PURC (Pipeline for Untangling Reticulate Complexes). In addition, PURC v2.0 features faster demultiplexing than the original version and has been updated to be compatible with Python 3. In this chapter we present results indicating that using the ASV approach is more likely to infer the correct biological sequences in comparison to the earlier OTU-based PURC and describe how to prepare sequencing data, run PURC v2.0 under several different modes, and interpret the output.
从扩增子混合物中推断真实的生物序列仍然是一个困难的生物信息学问题。传统的方法是通过相似性阈值对测序reads 进行聚类,并将每个聚类的共识序列视为一个“操作分类单元”(OTU)。最近,通过基于模型的方法对 PCR 和测序错误进行了修正,从而推断出“扩增子序列变体”(ASV),改进了这种方法。迄今为止,ASV 方法主要用于宏基因组学,但它们也可用于确定多倍体生物中的同源基因。为了促进多倍体研究人员使用 ASV 方法,我们在 PURC v2.0 中结合了 ASV 推断和 OTU 聚类,这是 PURC(Reticulate Complexes 解析流水线)的重大更新。此外,与原始版本相比,PURC v2.0 的多路分解速度更快,并已更新为与 Python 3 兼容。在本章中,我们提供了结果表明,与早期基于 OTU 的 PURC 相比,使用 ASV 方法更有可能推断出正确的生物序列,并介绍了如何准备测序数据、在几种不同模式下运行 PURC v2.0 以及解释输出。