Research Center for Mathematics and Interdisciplinary Sciences.
School of Mathematics, Shandong University, Jinan 250100, China.
Bioinformatics. 2020 Dec 22;36(20):5054-5060. doi: 10.1093/bioinformatics/btaa630.
Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets.
We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets.
Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717.
Supplementary data are available at Bioinformatics online.
分块聚类已成为识别复杂生物数据中功能模式的强大方法。 然而,现有的工具受到其准确性和效率的限制,无法识别淹没在越来越大数据集中的各种复杂分块。 我们引入了一种新颖的快速且高度准确的算法 RecBic,用于识别基因表达数据集中的各种形式的复杂分块。
我们设计了 RecBic 来识别各种趋势保留的分块,特别是那些形状较窄的分块,即基因数量大于条件/样本数量的分块。 给定一个基因表达矩阵,RecBic 从列种子开始,通过简单地重复比较实数将其生长为完整大小的分块。 在测试中,在所植入的趋势保留分块的元素和背景矩阵的元素具有相同分布的模拟数据集上,RecBic 几乎可以完美地识别植入的分块,在准确性和对噪声以及分块之间的重叠的鲁棒性方面优于所有比较突出的工具。 此外,RecBic 在识别真实基因表达数据集中功能相关的基因方面也表现出优势。
代码、示例输入数据和使用说明可在以下网站获得。 代码: https://github.com/holyzews/RecBic/tree/master/RecBic/。 数据: http://doi.org/10.5281/zenodo.3842717.
补充数据可在生物信息学在线获得。