Suppr超能文献

双聚类数据分析:全面综述。

Biclustering data analysis: a comprehensive survey.

机构信息

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

Abstract

Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.

摘要

双聚类(同时对数据矩阵的行和列进行聚类)由于能够生成局部模型而非全局模型,已被证明在生物信息学中非常有效。它已从基因表达数据分析中的关键技术演变为发现模式和识别生物模块的最常用方法之一,在描述性和预测性学习任务中都得到了广泛应用。本综述全面介绍了双聚类。它提出了一个基本组件(双聚类、双聚类解决方案、双聚类算法和评估指标)及其应用的更新分类法。我们使用新的定义统一了文献中分散的概念,以适应不同类型的数据(如表格、网络和时间序列数据)以及生物和生物医学数据领域的特殊性。我们进一步提出了一个双聚类数据分析的流程,并讨论了在实际应用中整合双聚类的实际方面。我们强调了突出的应用领域,特别是在生物信息学中,并确定了典型的双聚类以说明分析结果。此外,我们讨论了在选择、应用和评估双聚类算法时需要考虑的重要方面。我们还将双聚类与其他数据挖掘任务(聚类、模式挖掘、分类、三聚类、N -way 聚类和图挖掘)联系起来。因此,它为双聚类数据分析提供了理论和实践指导,展示了其从复杂数据集发现可操作见解的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ced/11247412/11b72798fe49/bbae342f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验