Westfall Dylan H, Deng Wenjie, Pankow Alec, Murrell Hugh, Chen Lennie, Zhao Hong, Williamson Carolyn, Rolland Morgane, Murrell Ben, Mullins James I
Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA.
Department of Pathology, Division of Medical Virology, University of Cape Town and National Health Laboratory Services, Observatory, Cape Town 7925, South Africa.
Virus Evol. 2024 Mar 2;10(1):veae019. doi: 10.1093/ve/veae019. eCollection 2024.
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing, which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence polymerase-chain reaction (PCR) amplicons derived from cDNA templates tagged with unique molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR. The use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Production of highly accurate sequences from the large datasets produced from SMRT-UMI sequencing is facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline). PORPIDpipeline automatically filters and parses circular consensus reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination, heteroduplex formation, or early cycle PCR errors. The optimized SMRT-UMI sequencing and PORPIDpipeline methods presented here represent a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus quasispecies in a virus transmitter-recipient pair of individuals.
导致准种的病原体多样性能够实现持久性,并适应宿主防御和治疗。然而,样本处理和测序过程中引入的错误可能会阻碍准种的准确表征,这可能需要进行大量优化才能克服。我们提出了完整的实验室和生物信息学工作流程,以克服其中的许多障碍。使用太平洋生物科学公司的单分子实时平台对来自用独特分子标识符(SMRT-UMI)标记的cDNA模板的聚合酶链反应(PCR)扩增子进行测序。通过对不同样本制备条件进行广泛测试,开发了优化的实验室方案,以尽量减少PCR过程中模板间的重组。UMI的使用实现了准确的模板定量,以及去除PCR和测序过程中引入的点突变,从而从每个模板生成高度准确的共有序列。一种新颖的生物信息学流程——引物ID的概率后代解析器(PORPID流程)有助于从SMRT-UMI测序产生的大型数据集中生成高度准确的序列。PORPID流程会按样本自动过滤和解析环形一致序列读数,识别并丢弃可能由PCR和测序错误产生UMI的读数,生成共有序列,检查数据集中是否存在污染,并去除任何有PCR重组、异源双链形成或早期循环PCR错误证据的序列。本文介绍的优化后的SMRT-UMI测序和PORPID流程方法代表了对多种病原体进行准确测序的高度适应性和成熟的起点。通过对一对病毒传播者-接受者个体中的人类免疫缺陷病毒准种进行表征,对这些方法进行了说明。