Suppr超能文献

双聚类技术的系统比较评估

A systematic comparative evaluation of biclustering techniques.

作者信息

Padilha Victor A, Campello Ricardo J G B

机构信息

Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil.

College of Science and Engineering, James Cook University, Townsville, QLD, Australia.

出版信息

BMC Bioinformatics. 2017 Jan 23;18(1):55. doi: 10.1186/s12859-017-1487-1.

Abstract

BACKGROUND

Biclustering techniques are capable of simultaneously clustering rows and columns of a data matrix. These techniques became very popular for the analysis of gene expression data, since a gene can take part of multiple biological pathways which in turn can be active only under specific experimental conditions. Several biclustering algorithms have been developed in the past recent years. In order to provide guidance regarding their choice, a few comparative studies were conducted and reported in the literature. In these studies, however, the performances of the methods were evaluated through external measures that have more recently been shown to have undesirable properties. Furthermore, they considered a limited number of algorithms and datasets.

RESULTS

We conducted a broader comparative study involving seventeen algorithms, which were run on three synthetic data collections and two real data collections with a more representative number of datasets. For the experiments with synthetic data, five different experimental scenarios were studied: different levels of noise, different numbers of implanted biclusters, different levels of symmetric bicluster overlap, different levels of asymmetric bicluster overlap and different bicluster sizes, for which the results were assessed with more suitable external measures. For the experiments with real datasets, the results were assessed by gene set enrichment and clustering accuracy.

CONCLUSIONS

We observed that each algorithm achieved satisfactory results in part of the biclustering tasks in which they were investigated. The choice of the best algorithm for some application thus depends on the task at hand and the types of patterns that one wants to detect.

摘要

背景

双聚类技术能够同时对数据矩阵的行和列进行聚类。由于一个基因可能参与多个生物途径,而这些途径又可能仅在特定实验条件下才活跃,因此这些技术在基因表达数据分析中变得非常流行。近年来已经开发了几种双聚类算法。为了为算法的选择提供指导,文献中进行并报道了一些比较研究。然而,在这些研究中,方法的性能是通过最近被证明具有不良特性的外部度量来评估的。此外,它们考虑的算法和数据集数量有限。

结果

我们进行了一项更广泛的比较研究,涉及十七种算法,这些算法在三个合成数据集和两个真实数据集上运行,数据集数量更具代表性。对于合成数据实验,研究了五种不同的实验场景:不同程度的噪声、不同数量的植入双聚类、不同程度的对称双聚类重叠、不同程度的非对称双聚类重叠以及不同的双聚类大小,针对这些场景使用更合适的外部度量来评估结果。对于真实数据集实验,通过基因集富集和聚类准确性来评估结果。

结论

我们观察到,每种算法在所研究的部分双聚类任务中都取得了令人满意的结果。因此,对于某些应用而言,最佳算法的选择取决于手头的任务以及想要检测的模式类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d42f/5259837/5554dd982c1d/12859_2017_1487_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验