Suppr超能文献

模糊c均值聚类在代谢组学数据分析中的应用。

Application of fuzzy c-means clustering in data analysis of metabolomics.

作者信息

Li Xiang, Lu Xin, Tian Jing, Gao Peng, Kong Hongwei, Xu Guowang

机构信息

CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, China.

出版信息

Anal Chem. 2009 Jun 1;81(11):4468-75. doi: 10.1021/ac900353t.

Abstract

Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.

摘要

模糊 c 均值(FCM)聚类是一种源自模糊逻辑的无监督方法,适用于解决多类和模糊聚类问题。在本研究中,FCM 聚类应用于代谢组学数据聚类。FCM 直接对数据矩阵进行操作,以生成一个隶属度矩阵,该矩阵表示样本与每个聚类的关联程度。该方法通过聚类数(C)和模糊系数(m)进行参数化,模糊系数表示算法中的模糊程度。通过将 FCM 与偏最小二乘法(PLS)相结合,以隶属度矩阵作为 PLS 模型中的 Y 矩阵,对两者进行了优化。PLS 模型的质量参数 R(2)Y 和 Q(2) 用于监测和优化 C 和 m。使用来自三种大肠杆菌基因类型的代谢谱数据来验证上述方法。比较了不同的多变量分析方法。主成分分析无法对代谢物数据进行建模,而偏最小二乘判别分析产生了过拟合的结果。基于优化后的参数,FCM 能够揭示三种大肠杆菌基因类型的主要表型变化和个体特征。结合 PLS,FCM 为代谢组学提供了一个强大的研究工具,具有更好的可视化、准确的分类和异常值估计能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验