Suppr超能文献

RecBic:一种快速准确的保持趋势的双聚类识别算法。

RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters.

机构信息

Research Center for Mathematics and Interdisciplinary Sciences.

School of Mathematics, Shandong University, Jinan 250100, China.

出版信息

Bioinformatics. 2020 Dec 22;36(20):5054-5060. doi: 10.1093/bioinformatics/btaa630.

Abstract

MOTIVATION

Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets.

RESULTS

We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets.

AVAILABILITY AND IMPLEMENTATION

Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

分块聚类已成为识别复杂生物数据中功能模式的强大方法。 然而,现有的工具受到其准确性和效率的限制,无法识别淹没在越来越大数据集中的各种复杂分块。 我们引入了一种新颖的快速且高度准确的算法 RecBic,用于识别基因表达数据集中的各种形式的复杂分块。

结果

我们设计了 RecBic 来识别各种趋势保留的分块,特别是那些形状较窄的分块,即基因数量大于条件/样本数量的分块。 给定一个基因表达矩阵,RecBic 从列种子开始,通过简单地重复比较实数将其生长为完整大小的分块。 在测试中,在所植入的趋势保留分块的元素和背景矩阵的元素具有相同分布的模拟数据集上,RecBic 几乎可以完美地识别植入的分块,在准确性和对噪声以及分块之间的重叠的鲁棒性方面优于所有比较突出的工具。 此外,RecBic 在识别真实基因表达数据集中功能相关的基因方面也表现出优势。

可用性和实现

代码、示例输入数据和使用说明可在以下网站获得。 代码: https://github.com/holyzews/RecBic/tree/master/RecBic/。 数据: http://doi.org/10.5281/zenodo.3842717.

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验