数据归一化对DNA微阵列数据模糊聚类的影响。

Effect of data normalization on fuzzy clustering of DNA microarray data.

作者信息

Kim Seo Young, Lee Jae Won, Bae Jong Sung

机构信息

Research Institute for Basic Science, Chonnam National University, Gwangju, 500-757, Korea.

出版信息

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

DOI:10.1186/1471-2105-7-134

PMID:16533412

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1431564/

Abstract

BACKGROUND

Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap.

RESULTS

In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples.

CONCLUSION

Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods when samples have varying expression patterns or are noisy. In particular, the FCM method slightly outperformed the hard clustering methods when the expression patterns of genes overlapped and was advantageous in finding co-regulated genes. Thus, the FCM approach offers a convenient method for finding subsets of genes that are strongly associated to a given cluster.

摘要

背景

微阵列技术使得在短时间内同时测量大量基因的表达水平成为可能。基因表达数据包含丰富的信息；然而，需要进行广泛的数据挖掘才能识别出表征潜在作用机制的模式。在微阵列数据分析中，聚类是寻找具有相似表达模式的基因群组的重要工具。然而，硬聚类方法将每个基因精确地分配到一个聚类中，不太适合分析微阵列数据集，因为在这类数据集中基因聚类经常重叠。

结果

在本研究中，我们应用了称为模糊C均值（FCM）的模糊划分聚类方法来克服硬聚类的局限性。为了确定数据归一化的效果，我们使用了三种归一化方法，即两种常见的尺度和位置变换方法以及局部加权散点平滑回归（Lowess）归一化方法，对三个微阵列数据集和三个模拟数据集进行归一化处理。首先，我们确定了FCM聚类的最佳参数。我们发现，在对微阵列数据集进行FCM分析时，最佳模糊化参数取决于预处理期间应用于该数据集的归一化方法。我们还评估了对有噪声数据集进行归一化处理对将硬聚类或FCM聚类应用于这些数据集时所获得结果的影响。使用模拟数据集和微阵列数据集评估了归一化的效果。比较分析表明，聚类结果取决于所使用的归一化方法和数据的噪声程度。特别是，对于样本间差异较大的数据集中使用的FCM方法，模糊化参数值的选择对所使用的归一化方法很敏感。

结论

当样本具有不同表达模式或存在噪声时，与两种常见的尺度和位置调整方法相比，Lowess归一化对于从一般微阵列数据中进行基因聚类更为稳健。特别是，当基因表达模式重叠时，FCM方法略优于硬聚类方法，并且在寻找共调控基因方面具有优势。因此，FCM方法为寻找与给定聚类密切相关的基因子集提供了一种便捷方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc3a/1431564/7103d38980e8/1471-2105-7-134-1.jpg

相似文献

Effect of data normalization on fuzzy clustering of DNA microarray data.

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

Noise-robust soft clustering of gene expression time-course data.

J Bioinform Comput Biol. 2005 Aug;3(4):965-88. doi: 10.1142/s0219720005001375.

Fuzzy C-means method for clustering microarray data.

Bioinformatics. 2003 May 22;19(8):973-80. doi: 10.1093/bioinformatics/btg119.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.

Appl Bioinformatics. 2003;2(1):35-45.

A unified framework for finding differentially expressed genes from microarray experiments.

BMC Bioinformatics. 2007 Sep 18;8:347. doi: 10.1186/1471-2105-8-347.

Fuzzy C-means method with empirical mode decomposition for clustering microarray data.

Int J Data Min Bioinform. 2013;7(2):103-17. doi: 10.1504/ijdmb.2013.053192.

A new validity measure for a correlation-based fuzzy c-means clustering algorithm.

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3865-8. doi: 10.1109/IEMBS.2009.5332582.

Fuzzy clustering analysis of microarray data.

Proc Inst Mech Eng H. 2008 Oct;222(7):1143-8. doi: 10.1243/09544119JEIM384.

Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.

BMC Bioinformatics. 2010 Oct 11;11:503. doi: 10.1186/1471-2105-11-503.

引用本文的文献

PRKD2 as a novel target for targeting the diabetes-osteoporosis nexus.

Sci Rep. 2025 Feb 8;15(1):4703. doi: 10.1038/s41598-025-89235-2.

Fuzzy Clustering of Maize Plant-Height Patterns Using Time Series of UAV Remote-Sensing Images and Variety Traits.

Front Plant Sci. 2019 Jul 17;10:926. doi: 10.3389/fpls.2019.00926. eCollection 2019.

Adjusting background noise in cluster analyses of longitudinal data.

Comput Stat Data Anal. 2017 May;109:93-104. doi: 10.1016/j.csda.2016.11.009. Epub 2016 Nov 27.

Fuzzy technique for microcalcifications clustering in digital mammograms.

BMC Med Imaging. 2014 Jun 24;14:23. doi: 10.1186/1471-2342-14-23.

A comprehensive comparison of different clustering methods for reliability analysis of microarray data.

J Med Signals Sens. 2013 Jan;3(1):22-30.

Fuzzy clustering of physicochemical and biochemical properties of amino acids.

Amino Acids. 2012 Aug;43(2):583-94. doi: 10.1007/s00726-011-1106-9. Epub 2011 Oct 13.

Classification of microarrays; synergistic effects between normalization, gene selection and machine learning.

BMC Bioinformatics. 2011 Oct 7;12:390. doi: 10.1186/1471-2105-12-390.

Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.

BMC Bioinformatics. 2010 Oct 11;11:503. doi: 10.1186/1471-2105-11-503.

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

BMC Bioinformatics. 2009 Jan 20;10:27. doi: 10.1186/1471-2105-10-27.

Altered expression patterns of lipid metabolism genes in an animal model of HCV core-related, nonobese, modest hepatic steatosis.

BMC Genomics. 2008 Feb 29;9:109. doi: 10.1186/1471-2164-9-109.

本文引用的文献

An adaptive method for cDNA microarray normalization.

BMC Bioinformatics. 2005 Feb 11;6:28. doi: 10.1186/1471-2105-6-28.

Optimized LOWESS normalization parameter selection for DNA microarray data.

BMC Bioinformatics. 2004 Dec 9;5:194. doi: 10.1186/1471-2105-5-194.

Fuzzy J-Means and VNS methods for clustering genes from microarray data.

Bioinformatics. 2004 Jul 22;20(11):1690-701. doi: 10.1093/bioinformatics/bth142. Epub 2004 Feb 26.

Fuzzy C-means method for clustering microarray data.

Bioinformatics. 2003 May 22;19(8):973-80. doi: 10.1093/bioinformatics/btg119.

Comparisons and validation of statistical clustering techniques for microarray gene expression data.

Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.

Computational analysis of microarray data.

Nat Rev Genet. 2001 Jun;2(6):418-27. doi: 10.1038/35076576.

Systematic determination of genetic network architecture.

Nat Genet. 1999 Jul;22(3):281-5. doi: 10.1038/10343.

Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.

Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2907-12. doi: 10.1073/pnas.96.6.2907.

The transcriptional program in the response of human fibroblasts to serum.

Science. 1999 Jan 1;283(5398):83-7. doi: 10.1126/science.283.5398.83.

Cluster analysis and display of genome-wide expression patterns.

Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8. doi: 10.1073/pnas.95.25.14863.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

数据归一化对DNA微阵列数据模糊聚类的影响。

Effect of data normalization on fuzzy clustering of DNA microarray data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献