Suppr超能文献

代谢组学数据的概率主成分分析。

Probabilistic principal component analysis for metabolomic data.

机构信息

School of Mathematical Sciences, University College Dublin, Ireland.

出版信息

BMC Bioinformatics. 2010 Nov 23;11:571. doi: 10.1186/1471-2105-11-571.

Abstract

BACKGROUND

Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model.

RESULTS

Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data.

CONCLUSIONS

The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.

摘要

背景

代谢组学数据通常是复杂的和高维的。主成分分析(PCA)是目前分析代谢组学数据最广泛使用的统计技术。然而,PCA 受到其不是基于统计模型的限制。

结果

本文回顾并扩展了概率主成分分析(PPCA),它解决了 PCA 的一些局限性。引入了一种新的 PPCA 扩展,称为概率主成分和协变量分析(PPCCA),它提供了一种灵活的方法来联合建模代谢组学数据和其他协变量信息。用于发现代谢组学数据中固有组数量的混合 PPCA 模型的使用得到了证明。通过使用贝叶斯信息准则模型选择工具确定主成分的最佳数量,该工具经过修改以解决数据的高维性。

结论

通过对代谢组学数据集的应用,展示了所提出的方法。成功地实现了代谢组学数据和协变量的联合建模,并有潜力提供对底层数据结构的更深入了解。检查模型参数(如载荷)的置信区间允许对底层数据结构进行有原则和清晰的解释。一个名为 MetabolAnalyze 的软件包已经开发出来,可以通过 R 统计软件免费获得,以促进在代谢组学领域实施所提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8dac/3006395/c88688cd15c6/1471-2105-11-571-1.jpg

相似文献

1
Probabilistic principal component analysis for metabolomic data.
BMC Bioinformatics. 2010 Nov 23;11:571. doi: 10.1186/1471-2105-11-571.
5
Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis.
Curr Protoc Bioinformatics. 2019 Dec;68(1):e86. doi: 10.1002/cpbi.86.
6
Network principal component analysis: a versatile tool for the investigation of multigroup and multiblock datasets.
Bioinformatics. 2021 Jun 9;37(9):1297-1303. doi: 10.1093/bioinformatics/btaa954.
7
MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach.
BMC Bioinformatics. 2013 Nov 21;14:338. doi: 10.1186/1471-2105-14-338.
8
Evaluation of Multivariate Classification Models for Analyzing NMR Metabolomics Data.
J Proteome Res. 2019 Sep 6;18(9):3282-3294. doi: 10.1021/acs.jproteome.9b00227. Epub 2019 Aug 22.
9
Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses.
Anal Chem. 2014 Jul 15;86(14):6931-9. doi: 10.1021/ac500734c. Epub 2014 Jun 25.

引用本文的文献

2
Simplex-structured matrix factorisation: application of soft clustering to metabolomic data.
Sci Rep. 2025 May 22;15(1):17817. doi: 10.1038/s41598-025-02361-9.
3
Discovery of SARS-CoV-2 Nsp14-Methyltransferase (MTase) Inhibitors by Harnessing Scaffold-Centric Exploration of the Ultra Large Chemical Space.
ACS Pharmacol Transl Sci. 2025 Apr 25;8(5):1366-1400. doi: 10.1021/acsptsci.5c00111. eCollection 2025 May 9.
4
Matrix Linear Models for Connecting Metabolite Composition to Individual Characteristics.
Metabolites. 2025 Feb 19;15(2):140. doi: 10.3390/metabo15020140.
6
A-SIMA/A-MAP: a comprehensive toolkit for NMR-based metabolomics analysis.
Metabolomics. 2024 Dec 19;21(1):10. doi: 10.1007/s11306-024-02208-w.
8
infection of young children in Colombia and its impact on the gastrointestinal environment.
mSphere. 2024 Oct 29;9(10):e0034224. doi: 10.1128/msphere.00342-24. Epub 2024 Sep 25.
10
Label-Free Quantitation of Endogenous Peptides.
Methods Mol Biol. 2024;2758:125-150. doi: 10.1007/978-1-0716-3646-6_7.

本文引用的文献

1
Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain.
Neurochem Int. 2010 Jan;56(2):340-4. doi: 10.1016/j.neuint.2009.11.004. Epub 2009 Nov 11.
2
Session 2: Personalised nutrition. Metabolomic applications in nutritional research.
Proc Nutr Soc. 2008 Nov;67(4):404-8. doi: 10.1017/S0029665108008719.
3
Inferring differentiation pathways from gene expression.
Bioinformatics. 2008 Jul 1;24(13):i156-64. doi: 10.1093/bioinformatics/btn153.
5
Mass spectrometry-based metabolomics.
Mass Spectrom Rev. 2007 Jan-Feb;26(1):51-78. doi: 10.1002/mas.20108.
7
Metabolomics in human nutrition: opportunities and challenges.
Am J Clin Nutr. 2005 Sep;82(3):497-503. doi: 10.1093/ajcn.82.3.497.
8
Metabonomic modeling of drug toxicity.
Pharmacol Ther. 2006 Jan;109(1-2):92-106. doi: 10.1016/j.pharmthera.2005.06.008. Epub 2005 Jul 26.
9
NMR-based metabolomics.
Drug Chem Toxicol. 2002 Nov;25(4):375-82. doi: 10.1081/dct-120014789.
10
A mixture model-based approach to the clustering of microarray expression data.
Bioinformatics. 2002 Mar;18(3):413-22. doi: 10.1093/bioinformatics/18.3.413.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验