• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

朴素贝叶斯碱基识别:一种用于高通量测序的基于模型的高效碱基识别算法。

naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.

作者信息

Kao Wei-Chun, Song Yun S

机构信息

Department of EECS, University of California, Berkeley, California, USA.

出版信息

J Comput Biol. 2011 Mar;18(3):365-77. doi: 10.1089/cmb.2010.0247.

DOI:10.1089/cmb.2010.0247
PMID:21385040
Abstract

Immense amounts of raw instrument data (i.e., images of fluorescence) are currently being generated using ultra high-throughput sequencing platforms. An important computational challenge associated with this rapid advancement is to develop efficient algorithms that can extract accurate sequence information from raw data. To address this challenge, we recently introduced a novel model-based base-calling algorithm that is fully parametric and has several advantages over previously proposed methods. Our original algorithm, called BayesCall, significantly reduced the error rate, particularly in the later cycles of a sequencing run, and also produced useful base-specific quality scores with a high discrimination ability. Unfortunately, however, BayesCall is too computationally expensive to be of broad practical use. In this article, we build on our previous model-based approach to devise an efficient base-calling algorithm that is orders of magnitude faster than BayesCall, while still maintaining a comparably high level of accuracy. Our new algorithm is called naive-BayesCall, and it utilizes approximation and optimization methods to achieve scalability. We describe the performance of naiveBayesCall and demonstrate how improved base-calling accuracy may facilitate de novo assembly and SNP detection when the sequence coverage depth is low to moderate.

摘要

目前,使用超高通量测序平台正在生成海量的原始仪器数据(即荧光图像)。与这一快速发展相关的一个重要计算挑战是开发能够从原始数据中提取准确序列信息的高效算法。为应对这一挑战,我们最近引入了一种基于模型的新型碱基识别算法,该算法完全是参数化的,并且相对于先前提出的方法具有多个优点。我们最初的算法称为BayesCall,它显著降低了错误率,尤其是在测序运行的后期循环中,并且还产生了具有高辨别能力的有用的碱基特异性质量得分。然而,不幸的是,BayesCall的计算成本过高,无法广泛实际应用。在本文中,我们基于先前基于模型的方法,设计了一种高效的碱基识别算法,该算法比BayesCall快几个数量级,同时仍保持相当高的准确性。我们的新算法称为朴素BayesCall,它利用近似和优化方法来实现可扩展性。我们描述了朴素BayesCall的性能,并展示了在序列覆盖深度为低到中等时,提高的碱基识别准确性如何促进从头组装和单核苷酸多态性(SNP)检测。

相似文献

1
naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing.朴素贝叶斯碱基识别:一种用于高通量测序的基于模型的高效碱基识别算法。
J Comput Biol. 2011 Mar;18(3):365-77. doi: 10.1089/cmb.2010.0247.
2
BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing.贝叶斯碱基识别:一种用于高通量短读长测序的基于模型的碱基识别算法。
Genome Res. 2009 Oct;19(10):1884-95. doi: 10.1101/gr.095299.109. Epub 2009 Aug 6.
3
TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.TotalReCaller:通过集成的对准和碱基调用提高准确性和性能。
Bioinformatics. 2011 Sep 1;27(17):2330-7. doi: 10.1093/bioinformatics/btr393. Epub 2011 Jun 30.
4
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
5
OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.OnlineCall:Illumina 新一代测序的快速在线参数估计和碱基调用。
Bioinformatics. 2012 Jul 1;28(13):1677-83. doi: 10.1093/bioinformatics/bts256. Epub 2012 May 7.
6
Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.通过碱基质量重新校准提高下一代测序中稀有等位基因的检测能力。
BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2.
7
Probabilistic base calling of Solexa sequencing data.Solexa测序数据的概率性碱基识别
BMC Bioinformatics. 2008 Oct 13;9:431. doi: 10.1186/1471-2105-9-431.
8
A Comparison of Base-calling Algorithms for Illumina Sequencing Technology.Illumina测序技术碱基识别算法的比较
Brief Bioinform. 2016 Sep;17(5):786-95. doi: 10.1093/bib/bbv088. Epub 2015 Oct 5.
9
Base calling for high-throughput short-read sequencing: dynamic programming solutions.高通量短读测序的碱基调用:动态规划解决方案。
BMC Bioinformatics. 2013 Apr 15;14:129. doi: 10.1186/1471-2105-14-129.
10
De novo sequencing and variant calling with nanopores using PoreSeq.使用PoreSeq通过纳米孔进行从头测序和变异检测。
Nat Biotechnol. 2015 Oct;33(10):1087-91. doi: 10.1038/nbt.3360. Epub 2015 Sep 9.

引用本文的文献

1
Pan-cancer analysis of systematic batch effects on somatic sequence variations.体细胞序列变异中系统批次效应的泛癌分析。
BMC Bioinformatics. 2017 Apr 11;18(1):211. doi: 10.1186/s12859-017-1627-7.
2
Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.通过碱基质量重新校准提高下一代测序中稀有等位基因的检测能力。
BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2.
3
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data.
基于大规模平行测序(MPS)数据的单核苷酸多态性(SNP)检测与基因型分型
Stat Biosci. 2013 May;5(1):3-25. doi: 10.1007/s12561-012-9067-4.
4
BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution.盲call:通过盲反卷积实现高通量测序数据的超快碱基调用。
Bioinformatics. 2014 May 1;30(9):1214-9. doi: 10.1093/bioinformatics/btu010. Epub 2014 Jan 9.
5
High-resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions.通过整合多个 16S rRNA 区域的短读长进行高分辨率微生物群落重建。
Nucleic Acids Res. 2013 Dec;41(22):e205. doi: 10.1093/nar/gkt1070. Epub 2013 Nov 7.
6
Base calling for high-throughput short-read sequencing: dynamic programming solutions.高通量短读测序的碱基调用:动态规划解决方案。
BMC Bioinformatics. 2013 Apr 15;14:129. doi: 10.1186/1471-2105-14-129.
7
A beginners guide to SNP calling from high-throughput DNA-sequencing data.高通量 DNA 测序数据中 SNP 调用的入门指南。
Hum Genet. 2012 Oct;131(10):1541-54. doi: 10.1007/s00439-012-1213-z. Epub 2012 Aug 11.
8
ParticleCall: a particle filter for base calling in next-generation sequencing systems.ParticleCall:一种用于下一代测序系统碱基调用的粒子滤波器。
BMC Bioinformatics. 2012 Jul 9;13:160. doi: 10.1186/1471-2105-13-160.
9
OnlineCall: fast online parameter estimation and base calling for illumina's next-generation sequencing.OnlineCall:Illumina 新一代测序的快速在线参数估计和碱基调用。
Bioinformatics. 2012 Jul 1;28(13):1677-83. doi: 10.1093/bioinformatics/bts256. Epub 2012 May 7.
10
All Your Base: a fast and accurate probabilistic approach to base calling.所有你的碱基:一种快速准确的碱基调用概率方法。
Genome Biol. 2012 Feb 29;13(2):R13. doi: 10.1186/gb-2012-13-2-r13.