Orzechowski Patryk, Moore Jason H
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
Department of Automatics and Robotics, AGH University of Science and Technology, Krakow, Poland.
Bioinformatics. 2019 Sep 1;35(17):3181-3183. doi: 10.1093/bioinformatics/btz027.
In this paper, we present an open source package with the latest release of Evolutionary-based BIClustering (EBIC), a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding a full support for multiple graphics processing units (GPUs) support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from the analysis.
Evolutionary-based BIClustering was applied to datasets of different sizes, including a large DNA methylation dataset with 436 444 rows. For the largest dataset we observed over 6.6-fold speedup in computation time on a cluster of eight GPUs compared to running the method on a single GPU. This proves high scalability of the method.
The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic. Installation and usage instructions are also available online.
Supplementary data are available at Bioinformatics online.
在本文中,我们展示了一个开源软件包,其中包含最新发布的基于进化的双聚类算法(EBIC),这是一种用于挖掘遗传数据的下一代双聚类算法。本文的主要贡献是增加了对多个图形处理单元(GPU)的全面支持,这使得高效运行大型基因组数据挖掘分析成为可能。该算法首次发布后的多项改进包括与R和Bioconductor集成,以及在分析中排除缺失值的选项。
基于进化的双聚类算法被应用于不同大小的数据集,包括一个有436444行的大型DNA甲基化数据集。对于最大的数据集,我们观察到与在单个GPU上运行该方法相比,在由八个GPU组成的集群上计算时间加快了6.6倍以上。这证明了该方法具有很高的可扩展性。
EBIC的最新版本可从http://github.com/EpistasisLab/ebic下载。在线也提供安装和使用说明。
补充数据可在《生物信息学》在线获取。