Li Xiang, Lu Xin, Tian Jing, Gao Peng, Kong Hongwei, Xu Guowang
CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.
Anal Chem. 2009 Jun 1;81(11):4468-75. doi: 10.1021/ac900353t.
Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.
模糊 c 均值(FCM)聚类是一种源自模糊逻辑的无监督方法,适用于解决多类和模糊聚类问题。在本研究中,FCM 聚类应用于代谢组学数据聚类。FCM 直接对数据矩阵进行操作,以生成一个隶属度矩阵,该矩阵表示样本与每个聚类的关联程度。该方法通过聚类数(C)和模糊系数(m)进行参数化,模糊系数表示算法中的模糊程度。通过将 FCM 与偏最小二乘法(PLS)相结合,以隶属度矩阵作为 PLS 模型中的 Y 矩阵,对两者进行了优化。PLS 模型的质量参数 R(2)Y 和 Q(2) 用于监测和优化 C 和 m。使用来自三种大肠杆菌基因类型的代谢谱数据来验证上述方法。比较了不同的多变量分析方法。主成分分析无法对代谢物数据进行建模,而偏最小二乘判别分析产生了过拟合的结果。基于优化后的参数,FCM 能够揭示三种大肠杆菌基因类型的主要表型变化和个体特征。结合 PLS,FCM 为代谢组学提供了一个强大的研究工具,具有更好的可视化、准确的分类和异常值估计能力。