• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于估计模糊k均值聚类中最优聚类数的改进模糊间隙统计量

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.

作者信息

Arima Chinatsu, Hakamada Kazumi, Okamoto Masahiro, Hanai Taizo

机构信息

Graduate School of Systems Life Sciences, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka 812-8581, Japan.

出版信息

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

DOI:10.1263/jbb.105.273
PMID:18397779
Abstract

In clustering methods, the estimation of the optimal number of clusters is significant for subsequent analysis. Without detailed biological information on the genes involved, the evaluation of the number of clusters becomes difficult, and we have to rely on an internal measure that is based on the distribution of the data of the clustering result. The Gap statistic has been proposed as a superior method for estimating the number of clusters in crisp clustering. In this study, we proposed a modified Fuzzy Gap statistic (MFGS) and applied it to fuzzy k-means clustering. For estimating the number of clusters, fuzzy k-means clustering with the MFGS was applied to two artificial data sets with noise and to two experimentally observed gene expression data sets. For the artificial data sets, compared with other internal measures, the MFGS showed a higher performance in terms of robustness against noise for estimating the optimal number of clusters. Moreover, it could be used to estimate the optimal number of clusters in experimental data sets. It was confirmed that the proposed MFGS is a useful method for estimating the number of clusters for microarray data sets.

摘要

在聚类方法中,估计最优聚类数对于后续分析至关重要。在缺乏有关所涉及基因的详细生物学信息的情况下,评估聚类数变得困难,我们不得不依赖基于聚类结果数据分布的内部度量。间隙统计量已被提出作为一种在清晰聚类中估计聚类数的优越方法。在本研究中,我们提出了一种改进的模糊间隙统计量(MFGS)并将其应用于模糊k均值聚类。为了估计聚类数,将带有MFGS的模糊k均值聚类应用于两个带噪声的人工数据集和两个实验观察到的基因表达数据集。对于人工数据集,与其他内部度量相比,MFGS在估计最优聚类数时对噪声的鲁棒性方面表现出更高的性能。此外,它可用于估计实验数据集中的最优聚类数。证实所提出的MFGS是一种用于估计微阵列数据集聚类数的有用方法。

相似文献

1
Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.用于估计模糊k均值聚类中最优聚类数的改进模糊间隙统计量
J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.
2
Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。
Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.
3
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析
Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.
4
Analysis of a Gibbs sampler method for model-based clustering of gene expression data.一种基于模型的基因表达数据聚类的吉布斯采样器方法分析。
Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.
5
Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类:采用TSD预聚类的FCV法
Appl Bioinformatics. 2003;2(1):35-45.
6
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
7
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.基于特征过滤方法和扩散映射的高维基因表达数据聚类。
Artif Intell Med. 2010 Feb-Mar;48(2-3):91-8. doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4.
8
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。
Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.
9
Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类
Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.
10
Estimating number of clusters based on a general similarity matrix with application to microarray data.基于通用相似性矩阵估计聚类数量及其在微阵列数据中的应用。
Stat Appl Genet Mol Biol. 2008;7(1):Article24. doi: 10.2202/1544-6115.1261. Epub 2008 Aug 2.

引用本文的文献

1
Dual blockade of IL-10 and PD-1 leads to control of SIV viral rebound following analytical treatment interruption.双重阻断 IL-10 和 PD-1 可控制分析治疗中断后 SIV 的病毒反弹。
Nat Immunol. 2024 Oct;25(10):1900-1912. doi: 10.1038/s41590-024-01952-4. Epub 2024 Sep 12.
2
Detecting Non-Overlapping Signals with Dynamic Programming.用动态规划检测非重叠信号
Entropy (Basel). 2023 Jan 30;25(2):250. doi: 10.3390/e25020250.
3
Analysis of gene expression profiles of soft tissue sarcoma using a combination of knowledge-based filtering with integration of multiple statistics.
结合基于知识的筛选与多种统计方法对软组织肉瘤的基因表达谱进行分析。
PLoS One. 2014 Sep 4;9(9):e106801. doi: 10.1371/journal.pone.0106801. eCollection 2014.
4
PPINGUIN: Peptide Profiling Guided Identification of Proteins improves quantitation of iTRAQ ratios.PPINGUIN:肽谱分析指导蛋白鉴定提高了 iTRAQ 比值的定量准确性。
BMC Bioinformatics. 2012 Feb 16;13:34. doi: 10.1186/1471-2105-13-34.