Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom.
PLoS Comput Biol. 2010 Dec 16;6(12):e1001022. doi: 10.1371/journal.pcbi.1001022.
Large-scale parallel pyrosequencing produces unprecedented quantities of sequence data. However, when generated from viral populations current mapping software is inadequate for dealing with the high levels of variation present, resulting in the potential for biased data loss. In order to apply the 454 Life Sciences' pyrosequencing system to the study of viral populations, we have developed software for the processing of highly variable sequence data. Here we demonstrate our software by analyzing two temporally sampled HIV-1 intra-patient datasets from a clinical study of maraviroc. This drug binds the CCR5 coreceptor, thus preventing HIV-1 infection of the cell. The objective is to determine viral tropism (CCR5 versus CXCR4 usage) and track the evolution of minority CXCR4-using variants that may limit the response to a maraviroc-containing treatment regimen. Five time points (two prior to treatment) were available from each patient. We first quantify the effects of divergence on initial read k-mer mapping and demonstrate the importance of utilizing population-specific template sequences in relation to the analysis of next-generation sequence data. Then, in conjunction with coreceptor prediction algorithms that infer HIV tropism, our software was used to quantify the viral population structure pre- and post-treatment. In both cases, low frequency CXCR4-using variants (2.5-15%) were detected prior to treatment. Following phylogenetic inference, these variants were observed to exist as distinct lineages that were maintained through time. Our analysis, thus confirms the role of pre-existing CXCR4-using virus in the emergence of maraviroc-insensitive HIV. The software will have utility for the study of intra-host viral diversity and evolution of other fast evolving viruses, and is available from http://www.bioinf.manchester.ac.uk/segminator/.
大规模平行焦磷酸测序产生了前所未有的序列数据量。然而,当从病毒群体中生成时,当前的映射软件不足以处理存在的高水平变异,从而导致数据丢失的潜在偏差。为了将 454 Life Sciences 的焦磷酸测序系统应用于病毒群体的研究,我们开发了用于处理高度变异序列数据的软件。在这里,我们通过分析来自马拉维若治疗的临床研究中两个时间采样的 HIV-1 患者内数据集来演示我们的软件。该药物结合 CCR5 核心受体,从而阻止 HIV-1 感染细胞。目的是确定病毒嗜性(CCR5 与 CXCR4 的使用)并追踪可能限制对含有马拉维若治疗方案的反应的少数 CXCR4 使用变体的演变。每个患者有五个时间点(两个在治疗前)。我们首先量化了分歧对初始读取 k-mer 映射的影响,并证明了在分析下一代序列数据时,利用特定于群体的模板序列的重要性。然后,结合推断 HIV 嗜性的核心受体预测算法,我们的软件用于量化治疗前后的病毒群体结构。在这两种情况下,在治疗前都检测到低频率的 CXCR4 使用变体(2.5-15%)。经过系统发育推断,这些变体被观察到作为不同的谱系存在,并且随着时间的推移得以维持。我们的分析因此证实了预先存在的 CXCR4 使用病毒在马拉维若不敏感 HIV 出现中的作用。该软件将对宿主内病毒多样性和其他快速进化病毒的进化研究具有实用价值,可从 http://www.bioinf.manchester.ac.uk/segminator/ 获得。