Suppr超能文献

SiNPle:用于深度测序数据的快速灵敏变异calling。

SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data.

机构信息

Integrative Biology and Bioinformatics, The Pirbright Institute, Woking GU24 0NF, UK.

出版信息

Genes (Basel). 2019 Jul 25;10(8):561. doi: 10.3390/genes10080561.

Abstract

Current high-throughput sequencing technologies can generate sequence data and provide information on the genetic composition of samples at very high coverage. Deep sequencing approaches enable the detection of rare variants in heterogeneous samples, such as viral quasi-species, but also have the undesired effect of amplifying sequencing errors and artefacts. Distinguishing real variants from such noise is not straightforward. Variant callers that can handle pooled samples can be in trouble at extremely high read depths, while at lower depths sensitivity is often sacrificed to specificity. In this paper, we propose SiNPle (Simplified Inference of Novel Polymorphisms from Large coveragE), a fast and effective software for variant calling. SiNPle is based on a simplified Bayesian approach to compute the posterior probability that a variant is not generated by sequencing errors or PCR artefacts. The Bayesian model takes into consideration individual base qualities as well as their distribution, the baseline error rates during both the sequencing and the PCR stage, the prior distribution of variant frequencies and their strandedness. Our approach leads to an approximate but extremely fast computation of posterior probabilities even for very high coverage data, since the expression for the posterior distribution is a simple analytical formula in terms of summary statistics for the variants appearing at each site in the genome. These statistics can be used to filter out putative SNPs and indels according to the required level of sensitivity. We tested SiNPle on several simulated and real-life viral datasets to show that it is faster and more sensitive than existing methods. The source code for SiNPle is freely available to download and compile, or as a Conda/Bioconda package.

摘要

当前的高通量测序技术可以生成序列数据,并提供非常高覆盖率下样本遗传组成的信息。深度测序方法能够检测异质样本中的稀有变体,如病毒准种,但也有放大测序错误和假象的不良影响。区分真实变体和这种噪声并不简单。可以处理混合样本的变异调用器在极高的读取深度下可能会遇到麻烦,而在较低的深度下,特异性往往会牺牲敏感性。在本文中,我们提出了 SiNPle(来自大覆盖的新多态性的简化推断),这是一种快速有效的变异调用软件。SiNPle 基于简化的贝叶斯方法来计算变异不是由测序错误或 PCR 假象产生的后验概率。贝叶斯模型考虑了单个碱基的质量及其分布、测序和 PCR 阶段的基线错误率、变异频率的先验分布及其链特异性。我们的方法导致了后验概率的近似但非常快速的计算,即使对于非常高的覆盖数据,因为后验分布的表达式是一个简单的解析公式,涉及基因组中每个位点出现的变体的摘要统计数据。这些统计数据可用于根据所需的灵敏度水平过滤掉假定的 SNP 和 indels。我们在几个模拟和真实病毒数据集上测试了 SiNPle,结果表明它比现有方法更快、更敏感。SiNPle 的源代码可免费下载和编译,也可作为 Conda/Bioconda 包使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f0/6722845/5eb06d830bdc/genes-10-00561-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验