Watson Institute for International and Public Affairs.
Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, USA.
Bioinformatics. 2019 Jun 1;35(12):2029-2035. doi: 10.1093/bioinformatics/bty919.
Next-generation deep sequencing of viral genomes, particularly on the Illumina platform, is increasingly applied in HIV research. Yet, there is no standard protocol or method used by the research community to account for measurement errors that arise during sample preparation and sequencing. Correctly calling high and low-frequency variants while controlling for erroneous variants is an important precursor to downstream interpretation, such as studying the emergence of HIV drug-resistance mutations, which in turn has clinical applications and can improve patient care.
We developed a new variant-calling pipeline, hivmmer, for Illumina sequences from HIV viral genomes. First, we validated hivmmer by comparing it to other variant-calling pipelines on real HIV plasmid datasets. We found that hivmmer achieves a lower rate of erroneous variants, and that all methods agree on the frequency of correctly called variants. Next, we compared the methods on an HIV plasmid dataset that was sequenced using Primer ID, an amplicon-tagging protocol, which is designed to reduce errors and amplification bias during library preparation. We show that the Primer ID consensus exhibits fewer erroneous variants compared to the variant-calling pipelines, and that hivmmer more closely approaches this low error rate compared to the other pipelines. The frequency estimates from the Primer ID consensus do not differ significantly from those of the variant-calling pipelines.
hivmmer is freely available for non-commercial use from https://github.com/kantorlab/hivmmer.
Supplementary data are available at Bioinformatics online.
下一代病毒基因组的高通量测序,特别是在 Illumina 平台上,越来越多地应用于 HIV 研究。然而,研究界没有使用标准协议或方法来解决样品制备和测序过程中出现的测量误差。正确地调用高频和低频变体,同时控制错误变体,是下游解释的重要前提,例如研究 HIV 耐药突变的出现,这反过来又具有临床应用价值,可以改善患者护理。
我们为 HIV 病毒基因组的 Illumina 序列开发了一个新的变异调用管道 hivmmer。首先,我们通过将 hivmmer 与其他变异调用管道在真实的 HIV 质粒数据集上进行比较来验证它。我们发现 hivmmer 错误变体的比率较低,并且所有方法都同意正确调用变体的频率。接下来,我们在使用 Primer ID 测序的 HIV 质粒数据集上比较了这些方法,Primer ID 是一种扩增标签协议,旨在减少文库制备过程中的错误和扩增偏倚。我们表明,与变异调用管道相比,Primer ID 共识显示出较少的错误变体,并且与其他管道相比,hivmmer 更接近这种低错误率。Primer ID 共识的频率估计与变异调用管道没有显著差异。
hivmmer 可从 https://github.com/kantorlab/hivmmer 免费获得,非商业用途。
补充数据可在生物信息学在线获得。