Suppr超能文献

从短读长测序数据中检测细菌基因组插入缺失的挑战。

The challenge of detecting indels in bacterial genomes from short-read sequencing data.

作者信息

Steglich Matthias, Nübel Ulrich

机构信息

Leibniz Institute DSMZ, Braunschweig, Germany.

Leibniz Institute DSMZ, Braunschweig, Germany; German Center for Infection Research (DZIF), Partner Site, Hannover-Braunschweig, Germany.

出版信息

J Biotechnol. 2017 May 20;250:11-15. doi: 10.1016/j.jbiotec.2017.02.026. Epub 2017 Mar 4.

Abstract

We tested the capabilities of four different software tools to detect insertions and deletions (indels) in a bacterial genome on the basis of short sequencing reads. We included tools applying the gapped-alignment (VarScan, FreeBayes) or split-read (Pindel) methods, respectively, and a combinatorial approach with local de-novo assembly (ScanIndel). Tests were performed with 151-basepair, paired-end sequencing reads simulated from a bacterial (Clostridioides difficile R20291) genome sequence with predefined indels (indel length, 1-2321bp). Results achieved with the different tools varied widely, and the specific sensitivity and false-discovery rates strongly depended on indel size. All tools tested were able to detect short indels (≤29 basepairs) at sensitivities close to 100%, albeit Pindel reported up to 20% false calls. In contrast, gapped-alignment and split-read tools failed to recover large proportions of long indels (>29bp) even at 120-fold coverage, and again, Pindel produced significant numbers of false-positive calls. Outstandingly, ScanIndel detected and reconstructed 97% of long indels on average (95% confidence intervals, 88%-99%) and, at the same time, produced negligible amounts of false calls. Hence, the combinatorial approach implemented in ScanIndel was able to recover the positions, types and sequences of indels with excellent sensitivity and false-discovery rate, by encompassing the full indel length spectrum present in the datasets.

摘要

我们测试了四种不同软件工具基于短测序读段检测细菌基因组中插入和缺失(indel)的能力。我们纳入了分别应用缺口比对(VarScan、FreeBayes)或分裂读段(Pindel)方法的工具,以及一种结合局部从头组装的组合方法(ScanIndel)。使用从具有预定义indel(indel长度为1 - 2321bp)的细菌(艰难梭菌R20291)基因组序列模拟的151碱基对双端测序读段进行测试。不同工具获得的结果差异很大,并且特定的灵敏度和错误发现率强烈依赖于indel大小。所有测试工具都能够以接近100%的灵敏度检测短indel(≤29碱基对),尽管Pindel报告的错误调用高达20%。相比之下,缺口比对和分裂读段工具即使在120倍覆盖度下也未能找回很大比例的长indel(>29bp),而且Pindel再次产生了大量假阳性调用。值得注意的是,ScanIndel平均检测并重建了97%的长indel(95%置信区间,88% - 99%),同时产生的错误调用量可忽略不计。因此,ScanIndel中实施的组合方法能够通过涵盖数据集中存在的完整indel长度谱,以出色的灵敏度和错误发现率找回indel的位置、类型和序列。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验