Suppr超能文献

系统探索焦磷酸测序图谱数据中的误差来源。

Systematic exploration of error sources in pyrosequencing flowgram data.

机构信息

Institute of Marine Research, P.O. Box 1870, N-5817 Bergen, Norway.

出版信息

Bioinformatics. 2011 Jul 1;27(13):i304-9. doi: 10.1093/bioinformatics/btr251.

Abstract

MOTIVATION

454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types.

RESULTS

By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim.

AVAILABILITY

The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim.

CONTACT

susanne.balzer@imr.no.

摘要

动机

罗氏诊断公司的 454 焦磷酸测序在读取长度、性能和成本方面已经取代了桑格测序,但它的每个碱基错误率更高。虽然有几种工具可用于去除噪声,针对不同的应用领域,但数据解释将受益于更好地了解不同的错误类型。

结果

通过探索 454 原始数据,我们量化了不同因素在多大程度上导致了测序错误。除了众所周知的长重复序列长度不准确之外,我们还确定了可能源自测序过程其他阶段的错误。我们利用这些发现扩展了 flowsim 管道的功能,以模拟这些错误,从而使 flowsim 能够更真实地模拟 454 焦磷酸测序数据。

可用性

flowsim 管道可根据通用公共许可证从 http://biohaskell.org/Applications/FlowSim 免费获得。

联系方式

susanne.balzer@imr.no.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80b4/3117331/a3575d4630d6/btr251f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验