基因表达数据的聚类：性能与相似性分析

Clustering of gene expression data: performance and similarity analysis.

作者信息

Yin Longde, Huang Chun-Hsi, Ni Jun

机构信息

Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA.

出版信息

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

DOI:10.1186/1471-2105-7-S4-S19

PMID:17217511

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1780119/

Abstract

BACKGROUND

DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.

RESULTS

In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms.

CONCLUSION

HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.

摘要

背景

DNA微阵列技术是实验分子生物学中的一种创新方法，它在基因表达谱方面产生了大量有价值的数据。已经提出了许多聚类算法来分析基因表达数据，但在如何从中进行选择方面几乎没有可用的指导。评估可行且适用的聚类算法正成为当今生物信息学研究中的一个重要问题。

结果

在本文中，我们首先使用酿酒酵母基因表达数据对三种主要聚类算法进行了实验研究：层次聚类（HC）、自组织映射（SOM）和自组织树算法（SOTA），并比较了它们的性能。然后，我们引入了一种新的数据挖掘工具Cluster Diff，以对不同算法生成的聚类进行相似性分析。性能研究表明，SOTA比SOM更高效，而HC效率最低。相似性分析结果表明，当给定一个目标聚类时，Cluster Diff可以有效地从一组聚类中确定最接近的匹配。因此，它是评估不同聚类算法的一种有效方法。

结论

HC方法能够以直观、便捷的方式呈现基因。然而，它们既不稳健也不高效。SOM对噪声更具鲁棒性。SOM的一个缺点是聚类数量必须预先确定。SOTA结合了层次聚类和SOM聚类的优点。它能够直观地呈现聚类及其结构，并且对噪声不敏感。SOTA也比其他两种聚类方法更灵活。通过使用我们的数据挖掘工具Cluster Diff，可以分析不同算法生成的聚类的相似性，从而对不同聚类方法进行比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8e1/1780119/c2bfe1fc5885/1471-2105-7-S4-S19-1.jpg

相似文献

Clustering of gene expression data: performance and similarity analysis.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

An improved algorithm for clustering gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

Evaluation of clustering algorithms for gene expression data.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

TimeClust: a clustering tool for gene expression time series.

Bioinformatics. 2008 Feb 1;24(3):430-2. doi: 10.1093/bioinformatics/btm605. Epub 2007 Dec 6.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

Hierarchical tree snipping: clustering guided by prior knowledge.

Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.

IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.

A systematic comparison and evaluation of biclustering methods for gene expression data.

Bioinformatics. 2006 May 1;22(9):1122-9. doi: 10.1093/bioinformatics/btl060. Epub 2006 Feb 24.

引用本文的文献

Automatic design of gene regulatory mechanisms for spatial pattern formation.

NPJ Syst Biol Appl. 2024 Apr 2;10(1):35. doi: 10.1038/s41540-024-00361-5.

Serum microRNA as a potential biomarker for the activity of thyroid eye disease.

Sci Rep. 2023 Jan 5;13(1):234. doi: 10.1038/s41598-023-27483-w.

Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections.

Front Genet. 2021 Jul 2;12:667936. doi: 10.3389/fgene.2021.667936. eCollection 2021.

BMC Bioinformatics. 2019 Aug 22;20(1):435. doi: 10.1186/s12859-019-3024-x.

Tumor Necrosis Factor Alpha and Insulin-Like Growth Factor 1 Induced Modifications of the Gene Expression Kinetics of Differentiating Skeletal Muscle Cells.

PLoS One. 2015 Oct 8;10(10):e0139520. doi: 10.1371/journal.pone.0139520. eCollection 2015.

Integrative Analysis of MicroRNA and mRNA Data Reveals an Orchestrated Function of MicroRNAs in Skeletal Myocyte Differentiation in Response to TNF-α or IGF1.

PLoS One. 2015 Aug 13;10(8):e0135284. doi: 10.1371/journal.pone.0135284. eCollection 2015.

RMaNI: Regulatory Module Network Inference framework.

BMC Bioinformatics. 2013;14 Suppl 16(Suppl 16):S14. doi: 10.1186/1471-2105-14-S16-S14. Epub 2013 Oct 22.

Redefining meaningful age groups in the context of disease.

Age (Dordr). 2013 Dec;35(6):2357-66. doi: 10.1007/s11357-013-9510-6. Epub 2013 Jan 27.

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.

Reconstruct modular phenotype-specific gene networks by knowledge-driven matrix factorization.

Bioinformatics. 2009 Sep 1;25(17):2236-43. doi: 10.1093/bioinformatics/btp376. Epub 2009 Jun 19.

本文引用的文献

Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns.

J Proteome Res. 2002 Sep-Oct;1(5):467-70. doi: 10.1021/pr025521v.

Trends in microarray analysis.

Nat Med. 2003 Jan;9(1):140-5. doi: 10.1038/nm0103-140.

Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction.

J Biotechnol. 2002 Sep 25;98(2-3):269-83. doi: 10.1016/s0168-1656(02)00137-2.

Cluster analysis of gene expression dynamics.

Proc Natl Acad Sci U S A. 2002 Jul 9;99(14):9121-6. doi: 10.1073/pnas.132656399. Epub 2002 Jun 24.

Validating clustering for gene expression data.

Bioinformatics. 2001 Apr;17(4):309-18. doi: 10.1093/bioinformatics/17.4.309.

Methods and approaches in the analysis of gene expression data.

J Immunol Methods. 2001 Apr;250(1-2):93-112. doi: 10.1016/s0022-1759(01)00307-6.

A hierarchical unsupervised growing neural network for clustering gene expression patterns.

Bioinformatics. 2001 Feb;17(2):126-36. doi: 10.1093/bioinformatics/17.2.126.

Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.

Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2907-12. doi: 10.1073/pnas.96.6.2907.

Exploring the new world of the genome with DNA microarrays.

Nat Genet. 1999 Jan;21(1 Suppl):33-7. doi: 10.1038/4462.

Cluster analysis and display of genome-wide expression patterns.

Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基因表达数据的聚类：性能与相似性分析

Clustering of gene expression data: performance and similarity analysis.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献