Suppr超能文献

发现诱导测序错误的模体。

Discovering motifs that induce sequencing errors.

机构信息

Life Sciences Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands.

出版信息

BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.

Abstract

BACKGROUND

Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors.

RESULTS

Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones.

CONCLUSIONS

Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format).

AVAILABILITY

http://discovering-cse.googlecode.com.

摘要

背景

在使用下一代测序(NGS)的大多数当前研究中,单核苷酸多态性(SNP)检测是主要目标,而测序错误率高是最主要的障碍。除了常规处理的一般来源的错误外,某些碱基调用错误与特定的序列模式有关。以前没有描述过将序列模式与碱基调用错误相关联的统计学原理方法。现有的方法要么由于将错误与单个基因组位置而不是基序相关联而导致功率决定性损失,要么不能正确区分基序诱导和序列非特异性错误源。

结果

在这里,我们首次描述了一种用于发现诱导测序错误的基序的统计严格框架。我们将我们的方法应用于来自 Illumina GA IIx、HiSeq 2000 和 MiSeq 测序仪的几个数据集。我们确认了先前已知的引起错误的序列上下文,并报告了新的更具体的序列上下文。

结论

在 SNP 调用管道中应该检查诱导错误的基序,以避免假阳性。为了方便过滤假定 SNP 的集合,我们提供了易出错基因组位置的轨道(以 BED 格式)。

可用性

http://discovering-cse.googlecode.com。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccb2/3622629/bb76e9a67b51/1471-2105-14-S5-S1-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验