Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
Bioinformatics. 2015 Mar 1;31(5):682-90. doi: 10.1093/bioinformatics/btu726. Epub 2014 Oct 29.
Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations.
We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy.
The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.
下一代测序(NGS)允许分析大量感染患者的病毒序列,为大规模进行病毒疾病的分子监测提供了机会。然而,尽管技术有所改进,但用于大量样本 NGS 的传统方案仍然非常耗费成本和劳动力。一种可能的具有成本效益的替代方案是组合池化。尽管已经提出了许多用于 DNA 样本共识测序和 SNP 检测的组合池化策略,但这些策略不能应用于高度异质的病毒群体测序。
我们开发了一种经济高效且可靠的病毒样本测序方案,该方案结合了使用条形码的 NGS 和组合池化,以及一个包含用于最优病毒特异性池设计和从测序池中对单个样本进行反卷积的算法的计算框架。在丙型肝炎病毒的实验和模拟数据上评估该框架表明,它可以大大降低测序成本,并可以高精度地反卷积病毒群体。
源代码和实验数据集可在 http://alan.cs.gsu.edu/NGS/?q=content/pooling 上获得。