Suppr超能文献

利用基因注释评估基于基因表达的聚类方法的质量。

Judging the quality of gene expression-based clustering methods using gene annotation.

作者信息

Gibbons Francis D, Roth Frederick P

机构信息

Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA.

出版信息

Genome Res. 2002 Oct;12(10):1574-81. doi: 10.1101/gr.397002.

Abstract

We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.

摘要

我们使用基于聚类成员与已知基因属性之间互信息的品质因数,比较了几种常用的基于表达的基因聚类算法。通过研究各种公开可用的表达数据集,我们得出结论,一般来说,在聚类数量相当低时,生物学功能聚类的富集程度最高。作为两个基因表达模式之间差异的度量,在聚类数量的最佳选择下,对于基于比率的测量,没有哪种方法比欧几里得距离更优;对于非基于比率的测量,没有哪种方法比皮尔逊距离更优。我们表明,在聚类数量较多时,自组织映射方法对于这两种测量类型都是最佳的。源自单链和平均链层次聚类的基因聚类往往产生比随机结果更差的结果。

相似文献

1
Judging the quality of gene expression-based clustering methods using gene annotation.
Genome Res. 2002 Oct;12(10):1574-81. doi: 10.1101/gr.397002.
2
From co-expression to co-regulation: how many microarray experiments do we need?
Genome Biol. 2004;5(7):R48. doi: 10.1186/gb-2004-5-7-r48. Epub 2004 Jun 28.
3
Selection of informative clusters from hierarchical cluster tree with gene classes.
BMC Bioinformatics. 2004 Mar 25;5:32. doi: 10.1186/1471-2105-5-32.
4
Clustering gene expression data using adaptive double self-organizing map.
Physiol Genomics. 2003 Jun 24;14(1):35-46. doi: 10.1152/physiolgenomics.00138.2002.
6
Comparisons of graph-structure clustering methods for gene expression data.
Acta Biochim Biophys Sin (Shanghai). 2006 Jun;38(6):379-84. doi: 10.1111/j.1745-7270.2006.00175.x.
7
Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering.
Gene. 2018 Dec 30;679:341-351. doi: 10.1016/j.gene.2018.08.062. Epub 2018 Sep 2.
8
Dynamically weighted clustering with noise set.
Bioinformatics. 2010 Feb 1;26(3):341-7. doi: 10.1093/bioinformatics/btp671. Epub 2009 Dec 9.
9
A neural network-based similarity index for clustering DNA microarray data.
Comput Biol Med. 2003 Jan;33(1):1-15. doi: 10.1016/s0010-4825(02)00032-x.
10
DNA microarray data and contextual analysis of correlation graphs.
BMC Bioinformatics. 2003 Apr 29;4:15. doi: 10.1186/1471-2105-4-15.

引用本文的文献

1
Effect of Corticosterone on Gene Expression in the Context of Global Hippocampal Transcription.
Int J Mol Sci. 2025 May 21;26(10):4889. doi: 10.3390/ijms26104889.
2
Genome-wide transcriptomics revealed carbon source-mediated gamma-aminobutyric acid (GABA) production in a probiotic, 9D3.
Heliyon. 2025 Jan 10;11(2):e41879. doi: 10.1016/j.heliyon.2025.e41879. eCollection 2025 Jan 30.
4
Type I Diabetes Mellitus Suppresses Experimental Skin Carcinogenesis.
Cancers (Basel). 2024 Apr 15;16(8):1507. doi: 10.3390/cancers16081507.
5
A Novel Calibration Step in Gene Co-Expression Network Construction.
Front Bioinform. 2021 Nov 23;1:704817. doi: 10.3389/fbinf.2021.704817. eCollection 2021.
7
Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections.
Front Genet. 2021 Jul 2;12:667936. doi: 10.3389/fgene.2021.667936. eCollection 2021.
9
A genome-wide portrait of pervasive drug contaminants.
Sci Rep. 2021 Jun 14;11(1):12487. doi: 10.1038/s41598-021-91792-1.
10
Differential Expression and PAH Degradation: What s G4 Can Tell Us?
Int J Microbiol. 2020 Aug 27;2020:8831331. doi: 10.1155/2020/8831331. eCollection 2020.

本文引用的文献

1
Saccharomyces Genome Database.
Methods Enzymol. 2002;350:329-46. doi: 10.1016/s0076-6879(02)50972-1.
4
Clustering based on conditional distributions in an auxiliary space.
Neural Comput. 2002 Jan;14(1):217-39. doi: 10.1162/089976602753284509.
5
Missing value estimation methods for DNA microarrays.
Bioinformatics. 2001 Jun;17(6):520-5. doi: 10.1093/bioinformatics/17.6.520.
6
Computational analysis of microarray data.
Nat Rev Genet. 2001 Jun;2(6):418-27. doi: 10.1038/35076576.
8
Validating clustering for gene expression data.
Bioinformatics. 2001 Apr;17(4):309-18. doi: 10.1093/bioinformatics/17.4.309.
9
Assessing clusters and motifs from gene expression data.
Genome Res. 2001 Jan;11(1):112-23. doi: 10.1101/gr.148301.
10
Genomic expression programs in the response of yeast cells to environmental changes.
Mol Biol Cell. 2000 Dec;11(12):4241-57. doi: 10.1091/mbc.11.12.4241.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验