• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

烈焰签名过滤器:一个用于快速进行两两相似性比较的库。

Blazing Signature Filter: a library for fast pairwise similarity comparisons.

机构信息

Integrative Omics, Pacific Northwest National Laboratory, Richland, 99352, WA, USA.

Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, 99352, WA, USA.

出版信息

BMC Bioinformatics. 2018 Jun 11;19(1):221. doi: 10.1186/s12859-018-2210-6.

DOI:10.1186/s12859-018-2210-6
PMID:29890950
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6047367/
Abstract

BACKGROUND

Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data.

RESULTS

The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes.

CONCLUSIONS

The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.

摘要

背景

在数据挖掘中,识别数据集之间的相似性是一项基本任务,并且已经成为现代科学研究不可或缺的一部分。无论是在大规模表达调查中识别共同表达的基因,还是预测会引起类似表型的基因敲除组合,基本的计算任务通常都是多维相似性测试。随着数据集的不断增长,提高此类计算的效率、灵敏度或特异性将产生广泛的影响,因为它使科学家能够更全面地探索丰富的科学数据。

结果

Blazing Signature Filter (BSF) 是一种高效的成对相似性算法,可在合理的时间内实现广泛的数据挖掘。该算法将数据集转换为二进制指标,从而可以利用计算效率高的位运算符并提供相似性的粗略度量。我们使用两个常见的生物信息学任务来演示我们算法的实用性:识别具有相似基因表达谱的数据集,以及比较已注释的基因组。

结论

BSF 是一种高效的成对相似性算法,可以扩展到数十亿次比较,而无需特殊硬件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/124b814bcaef/12859_2018_2210_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/5f0831e998eb/12859_2018_2210_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/aa13a47cda31/12859_2018_2210_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/52fc1583809c/12859_2018_2210_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/6c569982bd51/12859_2018_2210_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/1e4cc3a67ae2/12859_2018_2210_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/124b814bcaef/12859_2018_2210_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/5f0831e998eb/12859_2018_2210_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/aa13a47cda31/12859_2018_2210_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/52fc1583809c/12859_2018_2210_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/6c569982bd51/12859_2018_2210_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/1e4cc3a67ae2/12859_2018_2210_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/779c/6047367/124b814bcaef/12859_2018_2210_Fig6_HTML.jpg

相似文献

1
Blazing Signature Filter: a library for fast pairwise similarity comparisons.烈焰签名过滤器:一个用于快速进行两两相似性比较的库。
BMC Bioinformatics. 2018 Jun 11;19(1):221. doi: 10.1186/s12859-018-2210-6.
2
GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies.GRIM-Filter:使用内存处理技术在 DNA 读取映射中快速进行种子位置过滤。
BMC Genomics. 2018 May 9;19(Suppl 2):89. doi: 10.1186/s12864-018-4460-0.
3
Do it yourself guide to genome assembly.基因组组装自助指南。
Brief Funct Genomics. 2016 Jan;15(1):1-9. doi: 10.1093/bfgp/elu042. Epub 2014 Nov 11.
4
Starcode: sequence clustering based on all-pairs search.星码:基于全对搜索的序列聚类。
Bioinformatics. 2015 Jun 15;31(12):1913-9. doi: 10.1093/bioinformatics/btv053. Epub 2015 Jan 31.
5
Operating on Genomic Ranges Using BEDOPS.使用BEDOPS对基因组范围进行操作。
Methods Mol Biol. 2016;1418:267-81. doi: 10.1007/978-1-4939-3578-9_14.
6
Computational solutions for omics data.计算方法在组学数据中的应用。
Nat Rev Genet. 2013 May;14(5):333-46. doi: 10.1038/nrg3433.
7
Bioinformatics software for biologists in the genomics era.基因组学时代面向生物学家的生物信息学软件。
Bioinformatics. 2007 Jul 15;23(14):1713-7. doi: 10.1093/bioinformatics/btm239. Epub 2007 May 7.
8
Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models.根据经验性突变和测序模型模拟下一代测序数据集。
PLoS One. 2016 Nov 28;11(11):e0167047. doi: 10.1371/journal.pone.0167047. eCollection 2016.
9
NEAT: a framework for building fully automated NGS pipelines and analyses.NEAT:一个用于构建全自动二代测序流程及分析的框架。
BMC Bioinformatics. 2016 Feb 1;17:53. doi: 10.1186/s12859-016-0902-3.
10
FSG: Fast String Graph Construction for De Novo Assembly.FSG:用于从头组装的快速字符串图构建
J Comput Biol. 2017 Oct;24(10):953-968. doi: 10.1089/cmb.2017.0089. Epub 2017 Jul 17.

引用本文的文献

1
Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding.Snekmer:一种基于氨基酸重新编码的用于蛋白质序列指纹识别的可扩展流程。
Bioinform Adv. 2023 Feb 2;3(1):vbad005. doi: 10.1093/bioadv/vbad005. eCollection 2023.
2
Reproducibility and Transparency by Design.设计可重复性和透明度。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S202-S204. doi: 10.1074/mcp.IP119.001567. Epub 2019 Jul 4.
3
GPU-DAEMON: GPU algorithm design, data management & optimization template for array based big omics data.

本文引用的文献

1
L1000CDS: LINCS L1000 characteristic direction signatures search engine.L1000CDS:连通性图谱L1000特征方向签名搜索引擎。
NPJ Syst Biol Appl. 2016;2:16015-. doi: 10.1038/npjsba.2016.15. Epub 2016 Aug 4.
2
Identification of small-molecule inhibitors of Zika virus infection and induced neural cell death via a drug repurposing screen.通过药物再利用筛选鉴定寨卡病毒感染及诱导神经细胞死亡的小分子抑制剂。
Nat Med. 2016 Oct;22(10):1101-1107. doi: 10.1038/nm.4184. Epub 2016 Aug 29.
3
Bioinformatics methods in drug repurposing for Alzheimer's disease.
GPU-DAEMON:基于数组的大型组学数据的 GPU 算法设计、数据管理和优化模板。
Comput Biol Med. 2018 Oct 1;101:163-173. doi: 10.1016/j.compbiomed.2018.08.015. Epub 2018 Aug 16.
药物重用到阿尔茨海默病的生物信息学方法。
Brief Bioinform. 2016 Mar;17(2):322-35. doi: 10.1093/bib/bbv048. Epub 2015 Jul 21.
4
Repurposing salicylanilide anthelmintic drugs to combat drug resistant Staphylococcus aureus.重新利用水杨酰苯胺类驱虫药来对抗耐甲氧西林金黄色葡萄球菌。
PLoS One. 2015 Apr 21;10(4):e0124595. doi: 10.1371/journal.pone.0124595. eCollection 2015.
5
Compound signature detection on LINCS L1000 big data.基于LINCS L1000大数据的复合特征检测
Mol Biosyst. 2015 Mar;11(3):714-22. doi: 10.1039/c4mb00677a. Epub 2015 Jan 22.
6
UniProt: a hub for protein information.通用蛋白质数据库(UniProt):蛋白质信息中心。
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.
7
The characteristic direction: a geometrical approach to identify differentially expressed genes.特征方向:一种鉴别差异表达基因的几何方法。
BMC Bioinformatics. 2014 Mar 21;15:79. doi: 10.1186/1471-2105-15-79.
8
NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.NCBI 参考序列(RefSeq):现状、新特性和基因组注释政策。
Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5. doi: 10.1093/nar/gkr1079. Epub 2011 Nov 24.
9
Anatomy of high-performance 2D similarity calculations.高性能二维相似度计算的剖析。
J Chem Inf Model. 2011 Sep 26;51(9):2345-51. doi: 10.1021/ci200235e. Epub 2011 Sep 7.
10
Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont.细菌共生体极小且富含GC的基因组中替代遗传密码的起源
PLoS Genet. 2009 Jul;5(7):e1000565. doi: 10.1371/journal.pgen.1000565. Epub 2009 Jul 17.