Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA.
Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA.
Mol Biol Evol. 2021 Apr 13;38(4):1677-1690. doi: 10.1093/molbev/msaa315.
Deep sequencing of viral populations using next-generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intrahost viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here, we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.
使用下一代测序(NGS)对病毒群体进行深度测序为了解和研究进化、传播动态和群体遗传学提供了机会。目前,用于研究病毒群体的 NGS 数据分析的标准做法是将样本中的所有观察到的序列汇总为一个单一的共识序列,从而丢弃了有关宿主内病毒分子流行病学的有价值信息。此外,现有的分析管道可能仅分析涉及耐药性的基因组区域,因此不适合进行全病毒基因组分析。在这里,我们提出了 HAPHPIPE,这是一种用于病毒共识序列和单倍型全基因组组装的 HAplotype 和 PHylodynamics PIPEline。HAPHPIPE 协议包括质量修剪、错误校正、从头组装、对齐和单倍型重建模块。生成的共识序列、单倍型和对齐可以使用各种系统发育和群体遗传学软件进一步分析。HAPHPIPE 的设计目的是为用户提供一个单一的管道,用于快速分析来自 NGS 平台生成的病毒群体的序列,并提供适当格式化的高质量输出,以便进行下游进化分析。