LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

机构信息

Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, UK.

出版信息

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

DOI:10.1093/bioinformatics/btq226

PMID:20444838

Abstract

MOTIVATION

It is far from trivial to select the most effective clustering method and its parameterization, for a particular set of gene expression data, because there are a very large number of possibilities. Although many researchers still prefer to use hierarchical clustering in one form or another, this is often sub-optimal. Cluster ensemble research solves this problem by automatically combining multiple data partitions from different clusterings to improve both the robustness and quality of the clustering result. However, many existing ensemble techniques use an association matrix to summarize sample-cluster co-occurrence statistics, and relations within an ensemble are encapsulated only at coarse level, while those existing among clusters are completely neglected. Discovering these missing associations may greatly extend the capability of the ensemble methodology for microarray data clustering.

RESULTS

The link-based cluster ensemble (LCE) method, presented here, implements these ideas and demonstrates outstanding performance. Experiment results on real gene expression and synthetic datasets indicate that LCE: (i) usually outperforms the existing cluster ensemble algorithms in individual tests and, overall, is clearly class-leading; (ii) generates excellent, robust performance across different types of data, especially with the presence of noise and imbalanced data clusters; (iii) provides a high-level data matrix that is applicable to many numerical clustering techniques; and (iv) is computationally efficient for large datasets and gene clustering.

AVAILABILITY

Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

为特定的基因表达数据集选择最有效的聚类方法及其参数化远非微不足道，因为有非常多的可能性。尽管许多研究人员仍然倾向于以某种形式使用层次聚类，但这往往不是最优的。聚类集成研究通过自动组合来自不同聚类的多个数据分区来解决这个问题，从而提高聚类结果的稳健性和质量。然而，许多现有的集成技术使用关联矩阵来总结样本聚类共现统计信息，并且仅在粗粒度级别上封装集成内的关系，而完全忽略聚类之间的关系。发现这些缺失的关联可能极大地扩展了用于微阵列数据聚类的集成方法的能力。

结果

这里提出的基于链接的聚类集成 (LCE) 方法实现了这些思想，并展示了出色的性能。在真实基因表达和合成数据集上的实验结果表明，LCE：（i）通常在单项测试中优于现有的聚类集成算法，并且总体上明显领先于其他方法；（ii）在不同类型的数据中生成出色、稳健的性能，尤其是在存在噪声和不平衡数据聚类的情况下；（iii）提供了一个适用于许多数值聚类技术的高级数据矩阵；（iv）对于大型数据集和基因聚类，计算效率高。

可用性

在线补充材料和实现可在以下网址获得：http://users.aber.ac.uk/nii07/bioinformatics2010。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Class discovery from gene expression data based on perturbation and cluster ensemble.

IEEE Trans Nanobioscience. 2009 Jun;8(2):147-60. doi: 10.1109/TNB.2009.2023321. Epub 2009 Jun 2.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Knowledge based cluster ensemble for cancer discovery from biomolecular data.

IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.

引用本文的文献

Parallel median consensus clustering in complex networks.

Sci Rep. 2025 Jan 30;15(1):3788. doi: 10.1038/s41598-025-87479-6.

A new multivariate blood glucose prediction method with hybrid feature clustering and online transfer learning.

Health Inf Sci Syst. 2024 Nov 17;12(1):57. doi: 10.1007/s13755-024-00313-7. eCollection 2024 Dec.

clusterBMA: Bayesian model averaging for clustering.

PLoS One. 2023 Aug 21;18(8):e0288000. doi: 10.1371/journal.pone.0288000. eCollection 2023.

Predicting implementation of active learning by tenure-track teaching faculty using robust cluster analysis.

Int J STEM Educ. 2022;9(1):49. doi: 10.1186/s40594-022-00365-9. Epub 2022 Jul 28.

Machine learning: its challenges and opportunities in plant system biology.

Appl Microbiol Biotechnol. 2022 May;106(9-10):3507-3530. doi: 10.1007/s00253-022-11963-6. Epub 2022 May 16.

A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping.

Genes (Basel). 2019 Jan 18;10(1):66. doi: 10.3390/genes10010066.

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities.

Inf Fusion. 2019 Oct;50:71-91. doi: 10.1016/j.inffus.2018.09.012. Epub 2018 Sep 21.

diceR: an R package for class discovery using an ensemble driven approach.

BMC Bioinformatics. 2018 Jan 15;19(1):11. doi: 10.1186/s12859-017-1996-y.

Critical limitations of consensus clustering in class discovery.

Sci Rep. 2014 Aug 27;4:6207. doi: 10.1038/srep06207.

Semi-supervised consensus clustering for gene expression data analysis.

BioData Min. 2014 May 8;7:7. doi: 10.1186/1756-0381-7-7. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

机构信息

Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, UK.

出版信息

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

DOI:10.1093/bioinformatics/btq226

PMID:20444838

Abstract

MOTIVATION

RESULTS

AVAILABILITY

Online supplementary and implementation are available at: http://users.aber.ac.uk/nii07/bioinformatics2010.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结果

可用性

在线补充材料和实现可在以下网址获得：http://users.aber.ac.uk/nii07/bioinformatics2010。

补充信息

补充数据可在 Bioinformatics 在线获得。

LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献