• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用布隆过滤器对 DNA 序列进行分类。

Classification of DNA sequences using Bloom filters.

机构信息

Science for Life Laboratory, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.

出版信息

Bioinformatics. 2010 Jul 1;26(13):1595-600. doi: 10.1093/bioinformatics/btq230. Epub 2010 May 13.

DOI:10.1093/bioinformatics/btq230
PMID:20472541
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2887045/
Abstract

MOTIVATION

New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed.

RESULTS

A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences.

AVAILABILITY

Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/ approximately palvaro/Bloom-Faster-1.6/

CONTACTS

henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

新一代测序技术产生的日益复杂的数据集需要新的高效和专业的序列分析算法。通常,只有复杂数据集中的“新颖”序列才是感兴趣的,而多余的序列需要被去除。

结果

引入了一种新颖的算法,快速准确的序列分类(FACS),可以准确快速地将序列分类为属于或不属于参考序列。FACS 首先使用合成宏基因组数据集进行了优化和验证。然后使用实验宏基因组数据集表明,FACS 实现了与 BLAT 和 SSAHA2 相当的准确性,但在分类序列时至少快 21 倍。

可用性

FACS、Bloom 过滤器和 MetaSim 数据集的源代码可在 http://facs.biotech.kth.se 获得。Bloom::Faster 1.6 Perl 模块可从 CPAN 下载,网址为 http://search.cpan.org/approximately palvaro/Bloom-Faster-1.6/

联系方式

henrik.stranneheim@biotech.kth.se;joakiml@biotech.kth.se

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a56d/2887045/10869b5737e4/btq230f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a56d/2887045/ac576619082c/btq230f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a56d/2887045/10869b5737e4/btq230f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a56d/2887045/ac576619082c/btq230f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a56d/2887045/10869b5737e4/btq230f2.jpg

相似文献

1
Classification of DNA sequences using Bloom filters.使用布隆过滤器对 DNA 序列进行分类。
Bioinformatics. 2010 Jul 1;26(13):1595-600. doi: 10.1093/bioinformatics/btq230. Epub 2010 May 13.
2
BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters.BioBloom工具:使用布隆过滤器进行快速、准确且内存高效的宿主物种序列筛选。
Bioinformatics. 2014 Dec 1;30(23):3402-4. doi: 10.1093/bioinformatics/btu558. Epub 2014 Aug 20.
3
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.基于哈希的下一代 DNA 测序长读序列映射算法剖析。
Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18.
4
YOABS: yet other aligner of biological sequences--an efficient linearly scaling nucleotide aligner.YOABS:另一种生物序列比对工具——高效线性比例核苷酸比对工具。
Bioinformatics. 2012 Apr 15;28(8):1070-7. doi: 10.1093/bioinformatics/bts102. Epub 2012 Mar 7.
5
UProC: tools for ultra-fast protein domain classification.UProC:超快速蛋白质结构域分类工具
Bioinformatics. 2015 May 1;31(9):1382-8. doi: 10.1093/bioinformatics/btu843. Epub 2014 Dec 23.
6
Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A.使用 S3A 对宏基因组数据集进行靶向结构域组装,以快速进行功能分析。
Bioinformatics. 2020 Jul 1;36(13):3975-3981. doi: 10.1093/bioinformatics/btaa272.
7
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
8
Large-scale machine learning for metagenomics sequence classification.用于宏基因组学序列分类的大规模机器学习
Bioinformatics. 2016 Apr 1;32(7):1023-32. doi: 10.1093/bioinformatics/btv683. Epub 2015 Nov 20.
9
Higher-order Markov models for metagenomic sequence classification.用于宏基因组序列分类的高阶马尔可夫模型。
Bioinformatics. 2020 Aug 15;36(14):4130-4136. doi: 10.1093/bioinformatics/btaa562.
10
Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.整合基于比对和非比对的序列相似性度量用于生物序列分类。
Bioinformatics. 2015 May 1;31(9):1396-404. doi: 10.1093/bioinformatics/btv006. Epub 2015 Jan 7.

引用本文的文献

1
MetaBIDx: a new computational approach to bacteria identification in microbiomes.MetaBIDx:一种用于微生物群落中细菌鉴定的新计算方法。
Microbiome Res Rep. 2024 Apr 1;3(2):25. doi: 10.20517/mrr.2024.01. eCollection 2024.
2
Navigating bottlenecks and trade-offs in genomic data analysis.基因组数据分析中的瓶颈与权衡。
Nat Rev Genet. 2023 Apr;24(4):235-250. doi: 10.1038/s41576-022-00551-z. Epub 2022 Dec 7.
3
Representing bacteria with unique genomic signatures.用独特的基因组特征来表征细菌。

本文引用的文献

1
SHRiMP: accurate mapping of short color-space reads.SHRiMP:短颜色空间读数的精确映射
PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.
2
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
3
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Front Big Data. 2022 Nov 16;5:1018356. doi: 10.3389/fdata.2022.1018356. eCollection 2022.
4
Chromatin loop anchors contain core structural components of the gene expression machinery in maize.染色质环锚含有玉米中基因表达机制的核心结构组件。
BMC Genomics. 2021 Jan 6;22(1):23. doi: 10.1186/s12864-020-07324-0.
5
Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters.使用多间隔种子和多索引布隆过滤器进行容错、无比对的序列分类。
Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16961-16968. doi: 10.1073/pnas.1903436117. Epub 2020 Jul 8.
6
Pattern Matching for DNA Sequencing Data Using Multiple Bloom Filters.使用多个布隆过滤器进行 DNA 测序数据的模式匹配。
Biomed Res Int. 2019 Apr 14;2019:7074387. doi: 10.1155/2019/7074387. eCollection 2019.
7
Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies.通过对序列到分类基准研究的荟萃分析来识别准确的宏基因组和扩增子软件。
PeerJ. 2019 Jan 4;7:e6160. doi: 10.7717/peerj.6160. eCollection 2019.
8
Overview of Virus Metagenomic Classification Methods and Their Biological Applications.病毒宏基因组分类方法及其生物学应用概述
Front Microbiol. 2018 Apr 23;9:749. doi: 10.3389/fmicb.2018.00749. eCollection 2018.
9
Kollector: transcript-informed, targeted de novo assembly of gene loci.Kollector:基于转录本信息的基因座靶向从头组装。
Bioinformatics. 2017 Jun 15;33(12):1782-1788. doi: 10.1093/bioinformatics/btx078.
10
Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters.使用k-mer布隆过滤器提高序列数据上的布隆过滤器性能。
J Comput Biol. 2017 Jun;24(6):547-557. doi: 10.1089/cmb.2016.0155. Epub 2016 Nov 9.
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
4
DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.细胞遗传学正常的急性髓系白血病基因组的DNA测序
Nature. 2008 Nov 6;456(7218):66-72. doi: 10.1038/nature07485.
5
MetaSim: a sequencing simulator for genomics and metagenomics.MetaSim:一款用于基因组学和宏基因组学的测序模拟器。
PLoS One. 2008 Oct 8;3(10):e3373. doi: 10.1371/journal.pone.0003373.
6
Mapping short DNA sequencing reads and calling variants using mapping quality scores.使用比对质量分数比对短DNA测序读数并识别变异。
Genome Res. 2008 Nov;18(11):1851-8. doi: 10.1101/gr.078212.108. Epub 2008 Aug 19.
7
A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis.一种用于基于免疫沉淀的DNA甲基化组分析的贝叶斯反卷积策略。
Nat Biotechnol. 2008 Jul;26(7):779-85. doi: 10.1038/nbt1414.
8
SOAP: short oligonucleotide alignment program.SOAP:短寡核苷酸比对程序。
Bioinformatics. 2008 Mar 1;24(5):713-4. doi: 10.1093/bioinformatics/btn025. Epub 2008 Jan 28.
9
The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.“魔法师二号”全球海洋采样探险:从西北大西洋到东热带太平洋
PLoS Biol. 2007 Mar;5(3):e77. doi: 10.1371/journal.pbio.0050077.
10
Cloning of a human parvovirus by molecular screening of respiratory tract samples.通过对呼吸道样本进行分子筛选克隆人细小病毒
Proc Natl Acad Sci U S A. 2005 Sep 6;102(36):12891-6. doi: 10.1073/pnas.0504666102. Epub 2005 Aug 23.