Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Krakow, Poland.
Bioinformatics. 2018 Nov 1;34(21):3719-3726. doi: 10.1093/bioinformatics/bty401.
Biclustering algorithms are commonly used for gene expression data analysis. However, accurate identification of meaningful structures is very challenging and state-of-the-art methods are incapable of discovering with high accuracy different patterns of high biological relevance.
In this paper, a novel biclustering algorithm based on evolutionary computation, a sub-field of artificial intelligence, is introduced. The method called EBIC aims to detect order-preserving patterns in complex data. EBIC is capable of discovering multiple complex patterns with unprecedented accuracy in real gene expression datasets. It is also one of the very few biclustering methods designed for parallel environments with multiple graphics processing units. We demonstrate that EBIC greatly outperforms state-of-the-art biclustering methods, in terms of recovery and relevance, on both synthetic and genetic datasets. EBIC also yields results over 12 times faster than the most accurate reference algorithms.
EBIC source code is available on GitHub at https://github.com/EpistasisLab/ebic.
Supplementary data are available at Bioinformatics online.
分群算法常用于基因表达数据分析。然而,准确识别有意义的结构极具挑战性,现有方法无法高精度地发现具有高生物学相关性的不同模式。
本文提出了一种新的基于进化计算(人工智能的一个子领域)的分群算法。该方法称为 EBIC,旨在检测复杂数据中的保序模式。EBIC 能够以空前的精度在真实基因表达数据集中发现多种复杂模式。它也是少数专为具有多个图形处理单元的并行环境设计的分群算法之一。我们证明,EBIC 在合成和遗传数据集上的恢复和相关性方面均优于最先进的分群方法。EBIC 的速度也比最准确的参考算法快 12 倍以上。
EBIC 的源代码可在 GitHub 上获得,网址为 https://github.com/EpistasisLab/ebic。
补充数据可在 Bioinformatics 在线获得。