• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

叙述:基于期望最大化的新一代测序数据纠错工具。

Recount: expectation maximization based error correction tool for next generation sequencing data.

作者信息

Wijaya Edward, Frith Martin C, Suzuki Yutaka, Horton Paul

机构信息

AIST, Computational Biology Research Center, 2-42 Aomi, Koutou-Ku, Tokyo 135-0064, Japan.

出版信息

Genome Inform. 2009 Oct;23(1):189-201.

PMID:20180274
Abstract

Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads. Although methods developed for SAGE data can reduce these false counts to a considerable degree, until now they have not been implemented in a scalable way. Recently, a program named FREC has been developed to address this problem for next generation sequencing data. In this paper, we introduce RECOUNT, our implementation of an Expectation Maximization algorithm for tag count correction and compare it to FREC. Using both the reference genome and simulated data, we find that RECOUNT performs as well or better than FREC, while using much less memory (e.g. 5GB vs. 75GB). Furthermore, we report the first analysis of tag count correction with real data in the context of gene expression analysis. Our results show that tag count correction not only increases the number of mappable tags, but can make a real difference in the biological interpretation of next generation sequencing data. RECOUNT is an open-source C++ program available at http://seq.cbrc.jp/recount.

摘要

新一代测序技术能够快速大规模地生成序列数据集。不幸的是,这些技术也存在不可忽视的测序错误率,会通过引入错误读数和减少真实读数数量来使输出结果产生偏差。尽管为SAGE数据开发的方法可以在很大程度上减少这些错误计数,但到目前为止,它们尚未以可扩展的方式实现。最近,一个名为FREC的程序已被开发出来,用于解决新一代测序数据的这一问题。在本文中,我们介绍了RECOUNT,这是我们对用于标签计数校正的期望最大化算法的实现,并将其与FREC进行了比较。使用参考基因组和模拟数据,我们发现RECOUNT的性能与FREC相当或更好,同时使用的内存要少得多(例如5GB对75GB)。此外,我们报告了在基因表达分析背景下对真实数据进行标签计数校正的首次分析。我们的结果表明,标签计数校正不仅增加了可映射标签的数量,而且可以对新一代测序数据的生物学解释产生实际影响。RECOUNT是一个开源的C++程序,可从http://seq.cbrc.jp/recount获取。

相似文献

1
Recount: expectation maximization based error correction tool for next generation sequencing data.叙述:基于期望最大化的新一代测序数据纠错工具。
Genome Inform. 2009 Oct;23(1):189-201.
2
Statistical modeling of sequencing errors in SAGE libraries.SAGE文库中测序错误的统计建模
Bioinformatics. 2004 Aug 4;20 Suppl 1:i31-9. doi: 10.1093/bioinformatics/bth924.
3
EDAR: an efficient error detection and removal algorithm for next generation sequencing data.EDAR:一种用于下一代测序数据的高效错误检测与去除算法。
J Comput Biol. 2010 Nov;17(11):1549-60. doi: 10.1089/cmb.2010.0127. Epub 2010 Oct 25.
4
Correction of sequencing errors in a mixed set of reads.纠正混合读取集中的测序错误。
Bioinformatics. 2010 May 15;26(10):1284-90. doi: 10.1093/bioinformatics/btq151. Epub 2010 Apr 8.
5
[Transcriptomes for serial analysis of gene expression].[用于基因表达序列分析的转录组]
J Soc Biol. 2002;196(4):303-7.
6
Optimal spliced alignments of short sequence reads.短序列 reads 的最优剪接比对。
Bioinformatics. 2008 Aug 15;24(16):i174-80. doi: 10.1093/bioinformatics/btn300.
7
BING: biomedical informatics pipeline for Next Generation Sequencing.BING:用于下一代测序的生物医学信息学管道。
J Biomed Inform. 2010 Jun;43(3):428-34. doi: 10.1016/j.jbi.2009.11.003. Epub 2009 Nov 28.
8
De novo sequencing of plant genomes using second-generation technologies.利用第二代技术对植物基因组进行从头测序。
Brief Bioinform. 2009 Nov;10(6):609-18. doi: 10.1093/bib/bbp039.
9
Reptile: representative tiling for short read error correction.爬行动物:简称短读错误纠正的代表性平铺。
Bioinformatics. 2010 Oct 15;26(20):2526-33. doi: 10.1093/bioinformatics/btq468. Epub 2010 Aug 16.
10
Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.下一代转录组测序在功能和进化基因组学中的基准测试。
Mol Biol Evol. 2009 Dec;26(12):2731-44. doi: 10.1093/molbev/msp188. Epub 2009 Aug 25.

引用本文的文献

1
DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing.DUDE-Seq:用于靶向扩增子测序的快速、灵活且稳健的去噪方法
PLoS One. 2017 Jul 27;12(7):e0181463. doi: 10.1371/journal.pone.0181463. eCollection 2017.
2
Complete Genome Sequence of Algoriphagus sp. Strain M8-2, Isolated from a Brackish Lake.从微咸湖分离出的嗜冷杆菌属菌株M8-2的全基因组序列
Genome Announc. 2016 May 12;4(3):e00347-16. doi: 10.1128/genomeA.00347-16.
3
The Plasmodiophora brassicae genome reveals insights in its life cycle and ancestry of chitin synthases.
芸苔根肿菌基因组揭示了其生命周期及几丁质合成酶的起源。
Sci Rep. 2015 Jun 18;5:11153. doi: 10.1038/srep11153.
4
Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.去噪DNA深度测序数据——高通量测序错误及其校正
Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29.
5
Draft Genome Sequence of Pseudomonas abietaniphila KF717 (NBRC 110669), Isolated from Biphenyl-Contaminated Soil in Japan.从日本联苯污染土壤中分离得到的阿比蒂假单胞菌KF717(NBRC 110669)的基因组序列草图
Genome Announc. 2015 Mar 19;3(2):e00059-15. doi: 10.1128/genomeA.00059-15.
6
Complete Genome Sequence of a Dimethyl Sulfide-Utilizing Bacterium, Acinetobacter guillouiae Strain 20B (NBRC 110550).利用二甲基硫的细菌——吉氏不动杆菌菌株20B(NBRC 110550)的全基因组序列
Genome Announc. 2014 Oct 16;2(5):e01048-14. doi: 10.1128/genomeA.01048-14.
7
Complete Genome Sequence of Polychlorinated Biphenyl Degrader Comamonas testosteroni TK102 (NBRC 109938).多氯联苯降解菌睾丸酮丛毛单胞菌TK102(NBRC 109938)的全基因组序列
Genome Announc. 2014 Sep 11;2(5):e00865-14. doi: 10.1128/genomeA.00865-14.
8
Whole-genome sequence variation, population structure and demographic history of the Dutch population.荷兰人群的全基因组序列变异、种群结构和人口历史。
Nat Genet. 2014 Aug;46(8):818-25. doi: 10.1038/ng.3021. Epub 2014 Jun 29.
9
Complete Genome Sequence of the Thermophilic Polychlorinated Biphenyl Degrader Geobacillus sp. Strain JF8 (NBRC 109937).嗜热多氯联苯降解菌嗜热栖热放线菌菌株JF8(NBRC 109937)的全基因组序列
Genome Announc. 2014 Jan 23;2(1):e01213-13. doi: 10.1128/genomeA.01213-13.
10
BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.BLESS:基于布隆过滤器的高通量测序读错误纠正解决方案。
Bioinformatics. 2014 May 15;30(10):1354-62. doi: 10.1093/bioinformatics/btu030. Epub 2014 Jan 21.