ParBiBit：用于现代分布式内存系统上的二进制分块聚类的并行工具。

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.

机构信息

Grupo de Arquitectura de Computadores, Universidade da Coruña, A Coruña, Spain.

出版信息

PLoS One. 2018 Apr 2;13(4):e0194361. doi: 10.1371/journal.pone.0194361. eCollection 2018.

DOI:10.1371/journal.pone.0194361

PMID:29608567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5880350/

Abstract

Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.

摘要

双聚类技术在分析大规模数据集时越来越受到关注，因为它们可以识别出行和列都相关的二维子矩阵。在这项工作中，我们提出了 ParBiBit，这是一种用于加速二进制数据集上有趣双聚类搜索的并行工具，它在遗传学、市场营销或文本挖掘等不同领域非常流行。它基于已被多项研究证明准确的最新的顺序 Java 工具 BiBit，特别是在会产生许多大型双聚类的场景中。ParBiBit 使用与 BiBit 相同的方法（将二进制信息分组为模式），并提供相同的结果。然而，由于我们的工具是基于 C++11 的高效实现，包括对线程和 MPI 进程的支持，以利用现代分布式内存系统的计算能力，这些系统提供了通过网络连接的多个多核 CPU 节点，因此性能得到了显著提高。我们在两个不同的 8 节点系统上使用 18 个代表性输入数据集进行的性能评估表明，我们的工具比原始 BiBit 快得多。C++和 MPI 的源代码可在 Linux 系统上运行，并提供参考手册，可在 https://sourceforge.net/projects/parbibit/ 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dd0/5880350/2719d97e97cd/pone.0194361.g001.jpg

相似文献

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.ParBiBit：用于现代分布式内存系统上的二进制分块聚类的并行工具。

PLoS One. 2018 Apr 2;13(4):e0194361. doi: 10.1371/journal.pone.0194361. eCollection 2018.

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.MSAProbs-MPI：用于分布式内存系统的并行多序列比对工具

Bioinformatics. 2016 Dec 15;32(24):3826-3828. doi: 10.1093/bioinformatics/btw558. Epub 2016 Sep 16.

A biclustering algorithm for extracting bit-patterns from binary datasets.一种从二进制数据集中提取位模式的双向聚类算法。

Bioinformatics. 2011 Oct 1;27(19):2738-45. doi: 10.1093/bioinformatics/btr464. Epub 2011 Aug 8.

MPIGeneNet: Parallel Calculation of Gene Co-Expression Networks on Multicore Clusters.MPIGeneNet：多核集群上基因共表达网络的并行计算。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Sep-Oct;15(5):1732-1737. doi: 10.1109/TCBB.2017.2761340. Epub 2017 Oct 10.

ParDRe: faster parallel duplicated reads removal tool for sequencing studies.ParDRe：用于测序研究的更快的并行重复读数去除工具。

Bioinformatics. 2016 May 15;32(10):1562-4. doi: 10.1093/bioinformatics/btw038. Epub 2016 Jan 22.

ParRADMeth: Identification of Differentially Methylated Regions on Multicore Clusters.ParRADMeth：多核集群上差异甲基化区域的鉴定。

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2041-2049. doi: 10.1109/TCBB.2022.3230473. Epub 2023 Jun 5.

Biclustering sparse binary genomic data.双聚类稀疏二元基因组数据。

J Comput Biol. 2008 Dec;15(10):1329-45. doi: 10.1089/cmb.2008.0066.

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.用于非并行生物信息学应用程序的并行工作流管理器，以在超级计算机上解决大规模生物学问题。

J Bioinform Comput Biol. 2016 Apr;14(2):1641008. doi: 10.1142/S0219720016410080.

BicSPAM: flexible biclustering using sequential patterns.BicSPAM：使用序列模式的灵活双聚类

BMC Bioinformatics. 2014 May 6;15:130. doi: 10.1186/1471-2105-15-130.

SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining.SiBIC：一种通过最大频繁项集挖掘生成双簇网络的工具。

Methods Mol Biol. 2018;1807:95-111. doi: 10.1007/978-1-4939-8561-6_8.

引用本文的文献

Biclustering data analysis: a comprehensive survey.双聚类数据分析：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

EBIC: an open source software for high-dimensional and big data analyses.EBIC：一款用于高维及大数据分析的开源软件。

Bioinformatics. 2019 Sep 1;35(17):3181-3183. doi: 10.1093/bioinformatics/btz027.

本文引用的文献

A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules.一种用于双聚类分析和检测条件相关共表达网络模块的 GPU 加速算法。

Sci Rep. 2017 Jun 23;7(1):4162. doi: 10.1038/s41598-017-04070-4.

A systematic comparative evaluation of biclustering techniques.双聚类技术的系统比较评估

BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.

MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.MSAProbs-MPI：用于分布式内存系统的并行多序列比对工具

Bioinformatics. 2016 Dec 15;32(24):3826-3828. doi: 10.1093/bioinformatics/btw558. Epub 2016 Sep 16.

Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data.生物信息学优化问题中适应度函数的细粒度并行化：用于癌症分类的基因选择和基因表达数据的双聚类

BMC Bioinformatics. 2016 Aug 31;17(1):330. doi: 10.1186/s12859-016-1200-9.

Biclustering on expression data: A review.基于表达数据的双聚类分析：综述

J Biomed Inform. 2015 Oct;57:163-80. doi: 10.1016/j.jbi.2015.06.028. Epub 2015 Jul 6.

Identification of bicluster regions in a binary matrix and its applications.二值矩阵中双聚类区域的识别及其应用。

PLoS One. 2013 Aug 5;8(8):e71680. doi: 10.1371/journal.pone.0071680. Print 2013.

A comparative analysis of biclustering algorithms for gene expression data.基于基因表达数据的对比分析双聚类算法。

Brief Bioinform. 2013 May;14(3):279-92. doi: 10.1093/bib/bbs032. Epub 2012 Jul 6.

A biclustering algorithm for extracting bit-patterns from binary datasets.一种从二进制数据集中提取位模式的双向聚类算法。

Bioinformatics. 2011 Oct 1;27(19):2738-45. doi: 10.1093/bioinformatics/btr464. Epub 2011 Aug 8.

A systematic comparison and evaluation of biclustering methods for gene expression data.基因表达数据双聚类方法的系统比较与评估

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

Prediction of central nervous system embryonal tumour outcome based on gene expression.基于基因表达的中枢神经系统胚胎性肿瘤预后预测

Nature. 2002 Jan 24;415(6870):436-42. doi: 10.1038/415436a.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ParBiBit：用于现代分布式内存系统上的二进制分块聚类的并行工具。

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献