烈焰签名过滤器：一个用于快速进行两两相似性比较的库。

Blazing Signature Filter: a library for fast pairwise similarity comparisons.

机构信息

Integrative Omics, Pacific Northwest National Laboratory, Richland, 99352, WA, USA.

Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, 99352, WA, USA.

出版信息

BMC Bioinformatics. 2018 Jun 11;19(1):221. doi: 10.1186/s12859-018-2210-6.

DOI:10.1186/s12859-018-2210-6

PMID:29890950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6047367/

Abstract

BACKGROUND

Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data.

RESULTS

The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes.

CONCLUSIONS

The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.

摘要

背景

在数据挖掘中，识别数据集之间的相似性是一项基本任务，并且已经成为现代科学研究不可或缺的一部分。无论是在大规模表达调查中识别共同表达的基因，还是预测会引起类似表型的基因敲除组合，基本的计算任务通常都是多维相似性测试。随着数据集的不断增长，提高此类计算的效率、灵敏度或特异性将产生广泛的影响，因为它使科学家能够更全面地探索丰富的科学数据。

结果

Blazing Signature Filter (BSF) 是一种高效的成对相似性算法，可在合理的时间内实现广泛的数据挖掘。该算法将数据集转换为二进制指标，从而可以利用计算效率高的位运算符并提供相似性的粗略度量。我们使用两个常见的生物信息学任务来演示我们算法的实用性：识别具有相似基因表达谱的数据集，以及比较已注释的基因组。

结论

BSF 是一种高效的成对相似性算法，可以扩展到数十亿次比较，而无需特殊硬件。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

烈焰签名过滤器：一个用于快速进行两两相似性比较的库。

Blazing Signature Filter: a library for fast pairwise similarity comparisons.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

烈焰签名过滤器：一个用于快速进行两两相似性比较的库。

Blazing Signature Filter: a library for fast pairwise similarity comparisons.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献