Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin.
Bioinformatics. 2010 Mar 15;26(6):722-9. doi: 10.1093/bioinformatics/btq027. Epub 2010 Feb 9.
MOTIVATION: Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. RESULTS: We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels. CONTACT: peter.krawitz@googlemail.com; peter.robinson@charite.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:最近的几项研究表明,深度短读测序平台在重测序和单核苷酸变体 (SNV) 检测方面非常有效。虽然有几个可靠的算法可用于自动 SNV 检测,但在深度短读数据中自动检测微缺失和微插入则是一个新的生物信息学挑战。
结果:我们系统地分析了 MAQ、Bowtie、Burrows-Wheeler 比对工具 (BWA)、Novoalign 和 RazerS 等短读映射工具在包含缺失和插入的模拟数据集上的性能,并评估了缺失和插入对 SNV 检测错误率的影响。我们实现了一种简单的算法来计算等效插入缺失区域 eir,可用于处理映射工具生成的比对结果,以执行插入缺失调用。使用包含插入缺失的模拟数据,我们证明了插入缺失在短读数据上的检测效果良好:微缺失 (<4 bp) 的检测率>90%。我们的研究提供了基于未加缺口短序列读比对的 SNV 检测系统误差的见解。短序列读的加缺口比对可用于减少这种错误,并检测模拟短读数据中的微缺失。与 ABI Sanger 和 Roche 454 平台自动识别的微缺失的比较表明,短序列读取的微缺失检测可识别重叠和独特的缺失。
联系方式:peter.krawitz@googlemail.com;peter.robinson@charite.de
补充信息:补充数据可在“Bioinformatics”在线获取。
Bioinformatics. 2010-2-9
Methods Mol Biol. 2011
Bioinformatics. 2010-4-8
Bioinformatics. 2008-8-15
Mol Biol Evol. 2009-8-25
Bioinformatics. 2010-8-16
Bioinformatics. 2009-7-15
Bioinformatics. 2011-8-19
Nat Rev Genet. 2025-8-15
PLoS Comput Biol. 2023-8
Cells. 2022-11-3
Am J Hum Genet. 2021-1-7
BMC Med Genomics. 2020-11-10
Sci Transl Med. 2020-5-20
Sci Rep. 2017-10-26