Liu Xiangyu, Yu Ting, Zhao Xiaoyu, Long Chaoyi, Han Renmin, Su Zhengchang, Li Guojun
Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Jinan 250100, China.
Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
NAR Genom Bioinform. 2023 Jan 31;5(1):lqad009. doi: 10.1093/nargab/lqad009. eCollection 2023 Mar.
Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, none of existing algorithms can simultaneously identify both broader and narrower biclusters due to their failure of balancing between effectiveness and efficiency. We introduced ARBic, an algorithm which is capable of accurately identifying any significant biclusters of any shape, including broader, narrower and square, in any large scale gene expression dataset. ARBic was designed by integrating column-based and row-based strategies into a single biclustering procedure. The column-based strategy borrowed from RecBic, a recently published biclustering tool, extracts narrower biclusters, while the row-based strategy that iteratively finds the longest path in a specific directed graph, extracts broader ones. Being tested and compared to other seven salient biclustering algorithms on simulated datasets, ARBic achieves at least an average of 29% higher recovery, relevance and[Formula: see text] scores than the best existing tool. In addition, ARBic substantially outperforms all tools on real datasets and is more robust to noises, bicluster shapes and dataset types.
识别具有特定表达模式的显著基因双聚类是揭示基因表达数据中功能相关基因的有效方法。然而,由于现有算法无法在有效性和效率之间取得平衡,因此没有一种算法能够同时识别更宽泛和更狭窄的双聚类。我们引入了ARBic算法,它能够在任何大规模基因表达数据集中准确识别任何形状的显著双聚类,包括更宽泛、更狭窄和方形的双聚类。ARBic算法是通过将基于列的策略和基于行的策略集成到单个双聚类过程中设计而成的。基于列的策略借鉴了最近发布的双聚类工具RecBic,用于提取更狭窄的双聚类,而基于行的策略则通过在特定有向图中迭代找到最长路径来提取更宽泛的双聚类。在模拟数据集上进行测试并与其他七种显著的双聚类算法进行比较,ARBic算法在恢复率、相关性和[公式:见原文]得分方面至少比现有最佳工具平均高出29%。此外,ARBic算法在真实数据集上显著优于所有工具,并且对噪声、双聚类形状和数据集类型具有更强的鲁棒性。