Suppr超能文献

QQ-SNV:通过比较质量分位数进行低频单核苷酸变异检测

QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

作者信息

Van der Borght Koen, Thys Kim, Wetzels Yves, Clement Lieven, Verbist Bie, Reumers Joke, van Vlijmen Herman, Aerssens Jeroen

机构信息

Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.

Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000, Leuven, Belgium.

出版信息

BMC Bioinformatics. 2015 Nov 10;16:379. doi: 10.1186/s12859-015-0812-9.

Abstract

BACKGROUND

Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset.

RESULTS

For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers.

CONCLUSIONS

We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.

摘要

背景

新一代测序技术能够研究病毒感染的异质群体。当测序在高覆盖深度(“深度测序”)下进行时,可以检测到低频变异。在此,我们展示了QQ-SNV(http://sourceforge.net/projects/qqsnv),这是一种为Illumina测序平台开发的逻辑回归分类器模型,它使用质量分数的分位数,基于估计的单核苷酸变异(SNV)概率来区分真正的单核苷酸变异与测序错误。为了训练该模型,我们创建了一个包含五种HIV-1质粒的计算机模拟混合物数据集。我们在两个HIV和四个HCV质粒混合物数据集以及一个甲型H1N1流感临床数据集上,将我们的方法与现有方法LoFreq、ShoRAH和V-Phaser 2进行了比较测试。

结果

对于QQ-SNV的默认应用,使用0.5的SNV概率截止值来调用变异(QQ-SNV(D))。为了提高灵敏度,我们使用了0.0001的SNV概率截止值(QQ-SNV(HS))。为了同时提高特异性,当调用的SNV频率低于根据错误频率分布计算出的第80百分位数时,这些SNV将被否决(QQ-SNV(HS-P80))。在质粒混合物测试集上比较QQ-SNV与其他方法时,QQ-SNV(D)的表现与现有方法相似。QQ-SNV(HS)在所有测试集上更敏感,但假阳性更多。通过平衡灵敏度和特异性,发现QQ-SNV(HS-P80)在所有测试集中是最准确的方法。当应用于配对末端HCV测序研究时,最低掺入真实频率为0.5%,QQ-SNV(HS-P80)的灵敏度为100%(现有方法为40 - 60%),特异性为100%(现有方法为98.0 - 99.7%)。此外,QQ-SNV处理测试集所需的总体计算时间最少。最后,在临床样本测试中,QQ-SNV(HS-P80)从不同代的Illumina测序仪中一致检测到四个频率低于0.5%的假定真实变异。

结论

我们开发并成功评估了一种名为QQ-SNV的新方法,用于在Illumina深度测序病毒学数据上高效地进行单核苷酸变异调用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d8c/4641353/1b1bf031375b/12859_2015_812_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验