模糊c均值聚类在代谢组学数据分析中的应用。

Application of fuzzy c-means clustering in data analysis of metabolomics.

作者信息

Li Xiang, Lu Xin, Tian Jing, Gao Peng, Kong Hongwei, Xu Guowang

机构信息

CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.

出版信息

Anal Chem. 2009 Jun 1;81(11):4468-75. doi: 10.1021/ac900353t.

DOI:10.1021/ac900353t

PMID:19408956

Abstract

Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.

摘要

模糊 c 均值（FCM）聚类是一种源自模糊逻辑的无监督方法，适用于解决多类和模糊聚类问题。在本研究中，FCM 聚类应用于代谢组学数据聚类。FCM 直接对数据矩阵进行操作，以生成一个隶属度矩阵，该矩阵表示样本与每个聚类的关联程度。该方法通过聚类数（C）和模糊系数（m）进行参数化，模糊系数表示算法中的模糊程度。通过将 FCM 与偏最小二乘法（PLS）相结合，以隶属度矩阵作为 PLS 模型中的 Y 矩阵，对两者进行了优化。PLS 模型的质量参数 R(2)Y 和 Q(2) 用于监测和优化 C 和 m。使用来自三种大肠杆菌基因类型的代谢谱数据来验证上述方法。比较了不同的多变量分析方法。主成分分析无法对代谢物数据进行建模，而偏最小二乘判别分析产生了过拟合的结果。基于优化后的参数，FCM 能够揭示三种大肠杆菌基因类型的主要表型变化和个体特征。结合 PLS，FCM 为代谢组学提供了一个强大的研究工具，具有更好的可视化、准确的分类和异常值估计能力。

相似文献

Application of fuzzy c-means clustering in data analysis of metabolomics.

Anal Chem. 2009 Jun 1;81(11):4468-75. doi: 10.1021/ac900353t.

Alpha-cut implemented fuzzy clustering algorithms and switching regressions.

IEEE Trans Syst Man Cybern B Cybern. 2008 Jun;38(3):588-603. doi: 10.1109/TSMCB.2008.915537.

NMR metabolic analysis of samples using fuzzy K-means clustering.

Magn Reson Chem. 2009 Dec;47 Suppl 1:S96-104. doi: 10.1002/mrc.2502.

Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

IEEE Trans Syst Man Cybern B Cybern. 2009 Jun;39(3):578-91. doi: 10.1109/TSMCB.2008.2004818. Epub 2009 Jan 23.

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Fuzzy C-means clustering for chromatographic fingerprints analysis: A gas chromatography-mass spectrometry case study.

J Chromatogr A. 2016 Mar 18;1438:236-43. doi: 10.1016/j.chroma.2016.02.049. Epub 2016 Feb 17.

Supervisory control of wastewater treatment plants by combining principal component analysis and fuzzy c-means clustering.

Water Sci Technol. 2001;43(7):147-56.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

Controlling the false positive rate in fuzzy clustering using randomization: application to fMRI activation detection.

Magn Reson Imaging. 2004 Jun;22(5):631-8. doi: 10.1016/j.mri.2004.01.035.

Recursive fuzzy c-means clustering for recursive fuzzy identification of time-varying processes.

ISA Trans. 2011 Apr;50(2):159-69. doi: 10.1016/j.isatra.2011.01.004. Epub 2011 Feb 2.

引用本文的文献

: Automated Hierarchical Clustering and Principal Component Analysis of Large Metabolomic Datasets in R.

Metabolites. 2020 Jul 21;10(7):297. doi: 10.3390/metabo10070297.

Integrating transcriptomic techniques and k-means clustering in metabolomics to identify markers of abiotic and biotic stress in Medicago truncatula.

Metabolomics. 2018 Sep 17;14(10):126. doi: 10.1007/s11306-018-1424-y.

Multivariate strategy for the sample selection and integration of multi-batch data in metabolomics.

Metabolomics. 2017;13(10):114. doi: 10.1007/s11306-017-1248-1. Epub 2017 Aug 24.

Statistical methods for the analysis of high-throughput metabolomics data.

Comput Struct Biotechnol J. 2013 Mar 22;4:e201301009. doi: 10.5936/csbj.201301009. eCollection 2013.

Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation.

Med Phys. 2012 Aug;39(8):4903-17. doi: 10.1118/1.4736530.

MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

PLoS Comput Biol. 2011 Jul;7(7):e1002119. doi: 10.1371/journal.pcbi.1002119. Epub 2011 Jul 21.

Multi-dimensional mass spectrometry-based shotgun lipidomics and novel strategies for lipidomic analyses.

Mass Spectrom Rev. 2012 Jan-Feb;31(1):134-78. doi: 10.1002/mas.20342. Epub 2011 Jul 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

模糊c均值聚类在代谢组学数据分析中的应用。

Application of fuzzy c-means clustering in data analysis of metabolomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献