Verbist Bie M P, Thys Kim, Reumers Joke, Wetzels Yves, Van der Borght Koen, Talloen Willem, Aerssens Jeroen, Clement Lieven, Thas Olivier
Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia.
Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia.
Bioinformatics. 2015 Jan 1;31(1):94-101. doi: 10.1093/bioinformatics/btu587. Epub 2014 Aug 31.
In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations.
A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%.
The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.
在病毒学中,大规模平行测序(MPS)为研究病毒准种带来了诸多机会,例如在感染HIV-1和HCV的患者中。这对于理解耐药途径至关重要,而耐药途径能够显著改善治疗效果。尽管MPS平台允许对序列变异进行深入表征,但其测量仍存在大量技术噪声。对于Illumina测序而言,单碱基替换是主要的错误来源,阻碍了对低频突变的有效评估。幸运的是,碱基识别辅以质量分数(Qs),这有助于区分错误与真正的低频突变。
我们提出了一种变异检测工具Q-cpileup,它在过滤策略中利用核苷酸的质量分数来提高特异性。该工具嵌入到一个开源流程VirVarSeq中,该流程允许从fastq文件开始进行变异检测。使用质粒混合物和临床样本,我们表明Q-cpileup能够减少假阳性结果的数量。过滤策略具有适应性,可为每次测序运行中的单个样本提供优化阈值。此外,在密码子水平进行变异检测时,单核苷酸多态性之间的连锁信息得以保留。这使得病毒学家能够就报告的变异对抗病毒药物的反应立即进行生物学解读。与现有SNP检测工具的比较表明,使用Q-cpileup在密码子水平检测变异可获得出色的灵敏度,同时对频率低至0.5%的变异保持良好的特异性。
VirVarSeq连同用户指南和测试数据可在SourceForge上获取:http://sourceforge.net/projects/virtools/?source=directory 。