• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高通量焦磷酸测序数据测序错误的实证评估。

Empirical assessment of sequencing errors for high throughput pyrosequencing data.

作者信息

da Fonseca Paulo G S, Paiva Jorge A P, Almeida Luiz G P, Vasconcelos Ana T R, Freitas Ana T

机构信息

Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, Lisboa 1000-029, Portugal.

出版信息

BMC Res Notes. 2013 Jan 22;6:25. doi: 10.1186/1756-0500-6-25.

DOI:10.1186/1756-0500-6-25
PMID:23339526
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3852801/
Abstract

BACKGROUND

Sequencing-by-synthesis technologies significantly improve over the Sanger method in terms of speed and cost per base. However, they still usually fail to compete in terms of read length and quality. Current high-throughput implementations of the pyrosequencing technique yield reads whose length approach those of the capillary electrophoresis method. A less obvious question is whether their quality is affected by platform-specific sequencing errors.

RESULTS

We present an empirical study aimed at assessing the quality and characterising sequencing errors for high throughput pyrosequencing data. We have developed a procedure for extracting sequencing error data from genome assemblies and study their characteristics, in particular the length distribution of indel gaps and their relation to the sequence contexts where they occur. We used this procedure to analyse data from three prokaryotic genomes sequenced with the GS FLX technology. We also compared two models previously employed with success for peptide sequence alignment.

CONCLUSIONS

We observed an overall very low error rate in the analysed data, with indel errors being much more abundant than substitutions. We also observed a dependence between the length of the gaps and that of the homopolymer context where they occur. As with protein alignments, a power-law model seems to approximate the indel errors more accurately, although the results are not so conclusive as to justify a depart from the commonly used affine gap penalty scheme. In whichever case, however, our procedure can be used to estimate more realistic error model parameters.

摘要

背景

合成测序技术在速度和每碱基成本方面相比桑格法有显著提升。然而,在读取长度和质量方面,它们通常仍无法与之竞争。目前焦磷酸测序技术的高通量实现方式产生的读取长度接近毛细管电泳法。一个不太明显的问题是,其质量是否受到平台特定测序错误的影响。

结果

我们开展了一项实证研究,旨在评估高通量焦磷酸测序数据的质量并表征测序错误。我们开发了一种从基因组组装中提取测序错误数据并研究其特征的程序,特别是插入缺失间隙的长度分布及其与发生位置的序列上下文的关系。我们使用该程序分析了用GS FLX技术测序的三个原核基因组的数据。我们还比较了之前成功用于肽序列比对的两种模型。

结论

我们观察到分析数据中的总体错误率非常低,插入缺失错误比替换错误更为常见。我们还观察到间隙长度与其所在同聚物上下文长度之间的相关性。与蛋白质比对一样,幂律模型似乎能更准确地近似插入缺失错误,尽管结果并不足以确凿到证明要背离常用的仿射间隙罚分方案。然而,无论哪种情况,我们的程序都可用于估计更现实的错误模型参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/8109b94f482a/1756-0500-6-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/970377765f06/1756-0500-6-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/acd46cf95eff/1756-0500-6-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/0e61cd3b7958/1756-0500-6-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/d8a9d5fbcd13/1756-0500-6-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/8109b94f482a/1756-0500-6-25-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/970377765f06/1756-0500-6-25-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/acd46cf95eff/1756-0500-6-25-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/0e61cd3b7958/1756-0500-6-25-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/d8a9d5fbcd13/1756-0500-6-25-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/504b/3852801/8109b94f482a/1756-0500-6-25-5.jpg

相似文献

1
Empirical assessment of sequencing errors for high throughput pyrosequencing data.高通量焦磷酸测序数据测序错误的实证评估。
BMC Res Notes. 2013 Jan 22;6:25. doi: 10.1186/1756-0500-6-25.
2
Efficient alignment of pyrosequencing reads for re-sequencing applications.用于重测序应用的焦磷酸测序reads 的高效比对。
BMC Bioinformatics. 2011 May 16;12:163. doi: 10.1186/1471-2105-12-163.
3
HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data.HECTOR:一种基于平行多阶段同聚物谱的 454 测序数据错误校正器。
BMC Bioinformatics. 2014 May 6;15:131. doi: 10.1186/1471-2105-15-131.
4
Quality score based identification and correction of pyrosequencing errors.基于质量得分的焦磷酸测序错误识别与校正。
PLoS One. 2013 Sep 5;8(9):e73015. doi: 10.1371/journal.pone.0073015. eCollection 2013.
5
Genome assembly quality: assessment and improvement using the neutral indel model.基因组组装质量:使用中性插入缺失模型进行评估和改进。
Genome Res. 2010 May;20(5):675-84. doi: 10.1101/gr.096966.109. Epub 2010 Mar 19.
6
Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome.评估GS FLX焦磷酸测序技术用于大西洋鲑鱼基因组测序的可行性。
BMC Genomics. 2008 Aug 28;9:404. doi: 10.1186/1471-2164-9-404.
7
Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.评估使用 Mate-Pairs 解决从头组装的短读 prokaryotic 重复的好处。
BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95.
8
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.通过合成基因组文库的长读长测序完成细菌基因组草图
BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.
9
PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data.PCR 诱导的转换是超深度焦磷酸测序数据清洗后主要的错误来源。
PLoS One. 2013 Jul 23;8(7):e70388. doi: 10.1371/journal.pone.0070388. Print 2013.
10
BatAlign: an incremental method for accurate alignment of sequencing reads.BatAlign:一种用于测序读段精确比对的增量方法。
Nucleic Acids Res. 2015 Sep 18;43(16):e107. doi: 10.1093/nar/gkv533. Epub 2015 Jul 13.

本文引用的文献

1
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
2
GemSIM: general, error-model based simulator of next-generation sequencing data.GemSIM:新一代测序数据的通用、基于错误模型的模拟器。
BMC Genomics. 2012 Feb 15;13:74. doi: 10.1186/1471-2164-13-74.
3
Systematic exploration of error sources in pyrosequencing flowgram data.系统探索焦磷酸测序图谱数据中的误差来源。
Bioinformatics. 2011 Jul 1;27(13):i304-9. doi: 10.1093/bioinformatics/btr251.
4
Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing.454 GS-FLX Titanium 焦磷酸测序准确性和质量评估。
BMC Genomics. 2011 May 19;12:245. doi: 10.1186/1471-2164-12-245.
5
Characteristics of 454 pyrosequencing data--enabling realistic simulation with flowsim.454 焦磷酸测序数据的特征——使用 flowsim 进行现实模拟。
Bioinformatics. 2010 Sep 15;26(18):i420-5. doi: 10.1093/bioinformatics/btq365.
6
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
7
Accurate determination of microbial diversity from 454 pyrosequencing data.从454焦磷酸测序数据中准确测定微生物多样性。
Nat Methods. 2009 Sep;6(9):639-41. doi: 10.1038/nmeth.1361. Epub 2009 Aug 9.
8
Aggressive assembly of pyrosequencing reads with mates.将焦磷酸测序读数与配对序列进行积极组装。
Bioinformatics. 2008 Dec 15;24(24):2818-24. doi: 10.1093/bioinformatics/btn548. Epub 2008 Oct 24.
9
Next-generation DNA sequencing.下一代DNA测序
Nat Biotechnol. 2008 Oct;26(10):1135-45. doi: 10.1038/nbt1486.
10
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.来自高通量DNA测序的超短读长数据集存在大量偏差。
Nucleic Acids Res. 2008 Sep;36(16):e105. doi: 10.1093/nar/gkn425. Epub 2008 Jul 26.