Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA.
Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA.
Viruses. 2020 Jul 14;12(7):758. doi: 10.3390/v12070758.
Next-generation sequencing (NGS) offers a powerful opportunity to identify low-abundance, intra-host viral sequence variants, yet the focus of many bioinformatic tools on consensus sequence construction has precluded a thorough analysis of intra-host diversity. To take full advantage of the resolution of NGS data, we developed HAplotype PHylodynamics PIPEline (HAPHPIPE), an open-source tool for the de novo and reference-based assembly of viral NGS data, with both consensus sequence assembly and a focus on the quantification of intra-host variation through haplotype reconstruction. We validate and compare the consensus sequence assembly methods of HAPHPIPE to those of two alternative software packages, HyDRA and Geneious, using simulated HIV and empirical HIV, HCV, and SARS-CoV-2 datasets. Our validation methods included read mapping, genetic distance, and genetic diversity metrics. In simulated NGS data, HAPHPIPE generated consensus sequences significantly closer to the true consensus sequence than those produced by HyDRA and Geneious and performed comparably to Geneious for HIV sequences. Furthermore, using empirical data from multiple viruses, we demonstrate that HAPHPIPE can analyze larger sequence datasets due to its greater computational speed. Therefore, we contend that HAPHPIPE provides a more user-friendly platform for users with and without bioinformatics experience to implement current best practices for viral NGS assembly than other currently available options.
下一代测序(NGS)提供了一个强大的机会来识别低丰度的宿主内病毒序列变异体,但许多生物信息学工具的重点是构建共识序列,这使得对宿主内多样性的全面分析受到阻碍。为了充分利用 NGS 数据的分辨率,我们开发了 HAplotype PHylodynamics PIPEline(HAPHPIPE),这是一种用于病毒 NGS 数据的从头和基于参考的组装的开源工具,同时进行共识序列组装,并通过单倍型重建重点关注宿主内变异的定量分析。我们使用模拟的 HIV 以及真实的 HIV、HCV 和 SARS-CoV-2 数据集,验证并比较了 HAPHPIPE 的共识序列组装方法与两种替代软件包 HyDRA 和 Geneious 的方法。我们的验证方法包括读取映射、遗传距离和遗传多样性指标。在模拟的 NGS 数据中,HAPHPIPE 生成的共识序列比 HyDRA 和 Geneious 生成的共识序列更接近真实的共识序列,并且在 HIV 序列方面与 Geneious 的性能相当。此外,我们使用来自多种病毒的真实数据证明,HAPHPIPE 可以分析更大的序列数据集,因为它的计算速度更快。因此,我们认为 HAPHPIPE 为有和没有生物信息学经验的用户提供了一个更用户友好的平台,用于实施当前病毒 NGS 组装的最佳实践,优于其他当前可用的选择。