一种用于确定模糊 C 均值聚类分析参数的简单快速方法。

A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.

机构信息

Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.

出版信息

Bioinformatics. 2010 Nov 15;26(22):2841-8. doi: 10.1093/bioinformatics/btq534. Epub 2010 Sep 29.

DOI:10.1093/bioinformatics/btq534

PMID:20880957

Abstract

MOTIVATION

Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster formation with maximum resolution on the edge of randomness.

RESULTS

Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the dataset. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire dataset allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices.

摘要

动机

模糊 c-均值聚类广泛用于识别高维数据集（如 DNA 微阵列和定量蛋白质组学实验中获得的数据集）中的聚类结构。它的主要限制之一是缺乏一种计算快速的方法来设置算法参数的最优值。错误的参数值可能导致结果中包含纯粹的随机波动，或者忽略潜在的重要数据。最优解的参数值为聚类对于纯粹的随机数据集没有任何结果，但在随机性的边缘以最大分辨率检测到聚类形成。

结果

通过评估应用于随机数据集的聚类过程的结果来实现最优参数值的估计。在这种情况下，模糊系数的最优值遵循仅取决于数据集主要属性的常见规则。将集合的维度和对象的数量作为输入值，而不是评估整个数据集，使我们能够提出一个确定模糊系数的直接函数关系。这一结果强烈反对像许多以前的研究中那样使用预定义的模糊系数。通常使用验证指标来估计最佳聚类数。比较表明，质心之间的最小距离提供的结果至少与其他计算成本更高的指标获得的结果相当或更好。

相似文献

A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.一种用于确定模糊 C 均值聚类分析参数的简单快速方法。

Bioinformatics. 2010 Nov 15;26(22):2841-8. doi: 10.1093/bioinformatics/btq534. Epub 2010 Sep 29.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类：采用TSD预聚类的FCV法

Appl Bioinformatics. 2003;2(1):35-45.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

A Self-Adaptive Fuzzy -Means Algorithm for Determining the Optimal Number of Clusters.一种用于确定最优聚类数的自适应模糊均值算法

Comput Intell Neurosci. 2016;2016:2647389. doi: 10.1155/2016/2647389. Epub 2016 Nov 29.

Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Effect of data normalization on fuzzy clustering of DNA microarray data.数据归一化对DNA微阵列数据模糊聚类的影响。

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.用于估计模糊k均值聚类中最优聚类数的改进模糊间隙统计量

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

An automated method for gridding and clustering-based segmentation of cDNA microarray images.一种基于网格化和聚类的cDNA微阵列图像自动分割方法。

Comput Med Imaging Graph. 2009 Jan;33(1):40-9. doi: 10.1016/j.compmedimag.2008.10.003. Epub 2008 Nov 28.

On computing the fuzzifier in downward arrow FLVQ: a data driven approach.关于计算向下箭头模糊学习矢量量化中的模糊化器：一种数据驱动方法。

Int J Neural Syst. 2002 Apr;12(2):149-57. doi: 10.1142/S0129065702001060.

引用本文的文献

Integration of smart insoles for gait assessment in exoskeleton assisted rehabilitation.用于外骨骼辅助康复中步态评估的智能鞋垫集成

Sci Rep. 2025 Aug 4;15(1):28350. doi: 10.1038/s41598-025-10032-y.

2-oxoglutarate:acceptor oxidoreductase-catalyzed redox cycling effectively targets coccoid forms of Helicobacter pylori.2-氧代戊二酸：受体氧化还原酶催化的氧化还原循环有效地靶向幽门螺杆菌的球菌形态。

Nat Commun. 2025 Jul 29;16(1):6965. doi: 10.1038/s41467-025-62477-4.

Branched-chain amino acids modulate the proteomic profile of Trypanosoma cruzi metacyclogenesis induced by proline.支链氨基酸调节脯氨酸诱导的克氏锥虫体生代发生的蛋白质组图谱。

PLoS Negl Trop Dis. 2024 Oct 9;18(10):e0012588. doi: 10.1371/journal.pntd.0012588. eCollection 2024 Oct.

Temporally aligned segmentation and clustering (TASC) framework for behavior time series analysis.用于行为时间序列分析的时间对齐分割与聚类（TASC）框架

Sci Rep. 2024 Jun 28;14(1):14952. doi: 10.1038/s41598-024-63669-6.

Temporal gene expression during asexual development of the apicomplexan .无性生殖时期顶复门生物的基因表达

mSphere. 2024 Jun 25;9(6):e0011124. doi: 10.1128/msphere.00111-24. Epub 2024 May 29.

Decoding cocaine-induced proteomic adaptations in the mouse nucleus accumbens.解码可卡因诱导的小鼠伏隔核蛋白质组适应。

Sci Signal. 2024 Apr 16;17(832):eadl4738. doi: 10.1126/scisignal.adl4738.

Loss of CREBBP and KMT2D cooperate to accelerate lymphomagenesis and shape the lymphoma immune microenvironment.CREBBP 和 KMT2D 的缺失协同作用加速淋巴瘤的发生并塑造淋巴瘤的免疫微环境。

Nat Commun. 2024 Apr 3;15(1):2879. doi: 10.1038/s41467-024-47012-1.

The expression landscape and pangenome of long non-coding RNA in the fungal wheat pathogen .真菌小麦病原体中长非编码 RNA 的表达景观和泛基因组。

Microb Genom. 2023 Nov;9(11). doi: 10.1099/mgen.0.001136.

An integrated gene-to-outcome multimodal database for metabolic dysfunction-associated steatotic liver disease.代谢功能障碍相关脂肪性肝病的基因-表型多模态综合数据库。

Nat Med. 2023 Nov;29(11):2939-2953. doi: 10.1038/s41591-023-02602-2. Epub 2023 Oct 30.

Genomic and transcriptomic analyses reveal polygenic architecture for ecologically important traits in aspen ( Michx.).基因组和转录组分析揭示了白杨（Michx.）中具有生态重要性的性状的多基因结构。

Ecol Evol. 2023 Sep 28;13(10):e10541. doi: 10.1002/ece3.10541. eCollection 2023 Oct.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于确定模糊 C 均值聚类分析参数的简单快速方法。

A simple and fast method to determine the parameters for fuzzy c-means cluster analysis.

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献