Suppr超能文献

通过将概率聚类与链偏向的统计检验相结合,实现病毒群体中准确的单核苷酸变异检测。

Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias.

机构信息

Centre for Marine Bioinnovation and School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.

出版信息

BMC Genomics. 2013 Jul 24;14:501. doi: 10.1186/1471-2164-14-501.

Abstract

BACKGROUND

Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking.

RESULTS

By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall.

CONCLUSIONS

By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy.

摘要

背景

深度测序是评估病毒遗传多样性的有力工具。此类实验利用下一代测序技术提供的高覆盖率,将测序reads 视为群体样本。然而,区分真正的单核苷酸变异(SNV)与测序错误仍然具有挑战性。目前的方案具有较高的假阳性率,结果需要耗时的手动检查。

结果

通过统计建模,我们表明,如果同时考虑多个变异位点,那么在测序技术的错误率以下,从高覆盖率的病毒深度测序数据中可以可靠地调用 SNV,并且随着读取长度内真实序列多样性的增加,SNV 调用的准确性会提高。我们在两个对照数据集上证明了这些发现,表明与 HCV 中度多样性样本相比,在具有高度多样性的 HIV 样本中,SNV 检测更可靠。最后,我们表明在概率聚类保留假阳性 SNV 的情况下(例如由于样本多样性不足或系统错误),应用基于正向读取分布的贝塔二项式模型的链偏倚测试可以提高精度,而对真阳性召回率的影响可以忽略不计。

结论

通过将概率聚类(在程序 ShoRAH 中实现)与链偏倚的统计测试相结合,可以从深度测序的病毒群体中高精度地调用 SNV。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24df/3848937/0888c152fd4e/1471-2164-14-501-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验