代谢组学数据的概率主成分分析。

Probabilistic principal component analysis for metabolomic data.

机构信息

School of Mathematical Sciences, University College Dublin, Ireland.

出版信息

BMC Bioinformatics. 2010 Nov 23;11:571. doi: 10.1186/1471-2105-11-571.

DOI:10.1186/1471-2105-11-571

PMID:21092268

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3006395/

Abstract

BACKGROUND

Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model.

RESULTS

Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data.

CONCLUSIONS

The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.

摘要

背景

代谢组学数据通常是复杂的和高维的。主成分分析（PCA）是目前分析代谢组学数据最广泛使用的统计技术。然而，PCA 受到其不是基于统计模型的限制。

结果

本文回顾并扩展了概率主成分分析（PPCA），它解决了 PCA 的一些局限性。引入了一种新的 PPCA 扩展，称为概率主成分和协变量分析（PPCCA），它提供了一种灵活的方法来联合建模代谢组学数据和其他协变量信息。用于发现代谢组学数据中固有组数量的混合 PPCA 模型的使用得到了证明。通过使用贝叶斯信息准则模型选择工具确定主成分的最佳数量，该工具经过修改以解决数据的高维性。

结论

通过对代谢组学数据集的应用，展示了所提出的方法。成功地实现了代谢组学数据和协变量的联合建模，并有潜力提供对底层数据结构的更深入了解。检查模型参数（如载荷）的置信区间允许对底层数据结构进行有原则和清晰的解释。一个名为 MetabolAnalyze 的软件包已经开发出来，可以通过 R 统计软件免费获得，以促进在代谢组学领域实施所提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dac/3006395/c88688cd15c6/1471-2105-11-571-1.jpg

相似文献

Probabilistic principal component analysis for metabolomic data.代谢组学数据的概率主成分分析。

BMC Bioinformatics. 2010 Nov 23;11:571. doi: 10.1186/1471-2105-11-571.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification.一种用于小型高维基因表达数据集的新型混合降维技术，使用信息复杂度准则进行癌症分类。

Comput Math Methods Med. 2015;2015:370640. doi: 10.1155/2015/370640. Epub 2015 Feb 19.

Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis.主成分分析中因子载荷的统计假设检验及其在代谢物集富集分析中的应用。

BMC Bioinformatics. 2014 Feb 21;15:51. doi: 10.1186/1471-2105-15-51.

Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis.使用MetaboAnalyst 4.0进行全面综合的代谢组学数据分析。

Curr Protoc Bioinformatics. 2019 Dec;68(1):e86. doi: 10.1002/cpbi.86.

Network principal component analysis: a versatile tool for the investigation of multigroup and multiblock datasets.网络主成分分析：一种用于研究多组和多块数据集的通用工具。

Bioinformatics. 2021 Jun 9;37(9):1297-1303. doi: 10.1093/bioinformatics/btaa954.

MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach.MetSizeR：一种基于分析的代谢组学研究中最优样本量选择方法。

BMC Bioinformatics. 2013 Nov 21;14:338. doi: 10.1186/1471-2105-14-338.

Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data.多变量分类模型在分析 NMR 代谢组学数据中的应用评估。

J Proteome Res. 2019 Sep 6;18(9):3282-3294. doi: 10.1021/acs.jproteome.9b00227. Epub 2019 Aug 22.

Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses.交互式XCMS在线平台：简化高级代谢组学数据处理及后续统计分析

Anal Chem. 2014 Jul 15;86(14):6931-9. doi: 10.1021/ac500734c. Epub 2014 Jun 25.

Comparative analysis of targeted metabolomics: dominance-based rough set approach versus orthogonal partial least square-discriminant analysis.靶向代谢组学的比较分析：基于优势的粗糙集方法与正交偏最小二乘判别分析

J Biomed Inform. 2015 Feb;53:291-9. doi: 10.1016/j.jbi.2014.12.001. Epub 2014 Dec 11.

引用本文的文献

Profiling Environmental Variations in Condensed Tannins and Other Metabolites of Birdsfoot Trefoil ( L.) Genotypes.剖析鸟足三叶草（L.）基因型中缩合单宁及其他代谢产物的环境变异情况。

Plants (Basel). 2025 Sep 4;14(17):2766. doi: 10.3390/plants14172766.

Simplex-structured matrix factorisation: application of soft clustering to metabolomic data.单纯形结构矩阵分解：软聚类在代谢组学数据中的应用。

Sci Rep. 2025 May 22;15(1):17817. doi: 10.1038/s41598-025-02361-9.

Discovery of SARS-CoV-2 Nsp14-Methyltransferase (MTase) Inhibitors by Harnessing Scaffold-Centric Exploration of the Ultra Large Chemical Space.通过超大化学空间的支架中心探索发现严重急性呼吸综合征冠状病毒2非结构蛋白14甲基转移酶（MTase）抑制剂

ACS Pharmacol Transl Sci. 2025 Apr 25;8(5):1366-1400. doi: 10.1021/acsptsci.5c00111. eCollection 2025 May 9.

Matrix Linear Models for Connecting Metabolite Composition to Individual Characteristics.用于将代谢物组成与个体特征相联系的矩阵线性模型。

Metabolites. 2025 Feb 19;15(2):140. doi: 10.3390/metabo15020140.

Non-targeted Metabolomics Reveals the Potential Role of MFSD8 in Metabolism in Human Endothelial Cells.非靶向代谢组学揭示了MFSD8在人内皮细胞代谢中的潜在作用。

Mol Biotechnol. 2025 Feb 24. doi: 10.1007/s12033-025-01396-7.

A-SIMA/A-MAP: a comprehensive toolkit for NMR-based metabolomics analysis.A-SIMA/A-MAP：基于核磁共振代谢组学分析的综合工具包。

Metabolomics. 2024 Dec 19;21(1):10. doi: 10.1007/s11306-024-02208-w.

Multi-omics profiling reveals the molecular mechanisms of HO-induced detrimental effects on Thamnaconus septentrionalis.多组学分析揭示了 HO 诱导对北方喉毛花有害影响的分子机制。

BMC Genomics. 2024 Oct 21;25(1):984. doi: 10.1186/s12864-024-10903-0.

infection of young children in Colombia and its impact on the gastrointestinal environment.在哥伦比亚，幼儿感染及其对胃肠道环境的影响。

mSphere. 2024 Oct 29;9(10):e0034224. doi: 10.1128/msphere.00342-24. Epub 2024 Sep 25.

Benchmarking feature selection and feature extraction methods to improve the performances of machine-learning algorithms for patient classification using metabolomics biomedical data.对特征选择和特征提取方法进行基准测试，以提高使用代谢组学生物医学数据的机器学习算法在患者分类中的性能。

Comput Struct Biotechnol J. 2024 Mar 19;23:1274-1287. doi: 10.1016/j.csbj.2024.03.016. eCollection 2024 Dec.

Label-Free Quantitation of Endogenous Peptides.无标记定量内源性肽。

Methods Mol Biol. 2024;2758:125-150. doi: 10.1007/978-1-0716-3646-6_7.

本文引用的文献

Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain.戊四氮致痫对大鼠脑代谢组学图谱的影响。

Neurochem Int. 2010 Jan;56(2):340-4. doi: 10.1016/j.neuint.2009.11.004. Epub 2009 Nov 11.

Session 2: Personalised nutrition. Metabolomic applications in nutritional research.第2场：个性化营养。代谢组学在营养研究中的应用。

Proc Nutr Soc. 2008 Nov;67(4):404-8. doi: 10.1017/S0029665108008719.

Inferring differentiation pathways from gene expression.从基因表达推断分化途径。

Bioinformatics. 2008 Jul 1;24(13):i156-64. doi: 10.1093/bioinformatics/btn153.

Effect of acute dietary standardization on the urinary, plasma, and salivary metabolomic profiles of healthy humans.急性饮食标准化对健康人体尿液、血浆和唾液代谢组学图谱的影响。

Am J Clin Nutr. 2006 Sep;84(3):531-9. doi: 10.1093/ajcn/84.3.531.

Mass spectrometry-based metabolomics.基于质谱的代谢组学

Mass Spectrom Rev. 2007 Jan-Feb;26(1):51-78. doi: 10.1002/mas.20108.

Centering, scaling, and transformations: improving the biological information content of metabolomics data.居中、缩放和变换：提高代谢组学数据的生物学信息含量

BMC Genomics. 2006 Jun 8;7:142. doi: 10.1186/1471-2164-7-142.

Metabolomics in human nutrition: opportunities and challenges.人类营养中的代谢组学：机遇与挑战。

Am J Clin Nutr. 2005 Sep;82(3):497-503. doi: 10.1093/ajcn.82.3.497.

Metabonomic modeling of drug toxicity.药物毒性的代谢组学建模

Pharmacol Ther. 2006 Jan;109(1-2):92-106. doi: 10.1016/j.pharmthera.2005.06.008. Epub 2005 Jul 26.

NMR-based metabolomics.基于核磁共振的代谢组学。

Drug Chem Toxicol. 2002 Nov;25(4):375-82. doi: 10.1081/dct-120014789.

A mixture model-based approach to the clustering of microarray expression data.一种基于混合模型的微阵列表达数据聚类方法。

Bioinformatics. 2002 Mar;18(3):413-22. doi: 10.1093/bioinformatics/18.3.413.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

代谢组学数据的概率主成分分析。

Probabilistic principal component analysis for metabolomic data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献