序贯投影寻踪主成分分析——处理与新组学技术相关的缺失数据。

Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.

机构信息

Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA, USA.

出版信息

Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.

DOI:10.2144/000113978

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6191041/

Abstract

Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.

摘要

主成分分析（PCA）是一种常用的探索性工具，用于评估大型复杂数据集。得到的低维表示通常对于数据的模式可视化、聚类或分类很有价值。然而，由于新技术（如无标记质谱）产生的许多组学数据集中存在大量非随机缺失值，因此不能直接应用 PCA 方法。在这里，我们提出了一种序贯投影寻踪 PCA（sppPCA）方法，用于在存在缺失数据的情况下定义主成分。我们的结果表明，与常用的插补方法相比，该方法生成了稳健且信息量丰富的低维数据表示。

相似文献

1

Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.序贯投影寻踪主成分分析——处理与新组学技术相关的缺失数据。

Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.

2

MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis.MultiAlign：一种用于靶向组学分析的多重 LC-MS 分析工具。

BMC Bioinformatics. 2013 Feb 12;14:49. doi: 10.1186/1471-2105-14-49.

3

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。

Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.

4

Processing and analysis of GC/LC-MS-based metabolomics data.基于气相色谱/液相色谱-质谱联用技术的代谢组学数据的处理与分析

Methods Mol Biol. 2011;708:277-98. doi: 10.1007/978-1-61737-985-7_17.

5

Mass spectrometry: from proteomics to metabolomics and lipidomics.质谱分析：从蛋白质组学到代谢组学和脂质组学

Chem Soc Rev. 2009 Jul;38(7):1882-96. doi: 10.1039/b618553n. Epub 2009 Feb 4.

6

Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method.使用潜在变量投影法对质谱蛋白质组学数据进行多变量数据分析。

Proteomics. 2003 Sep;3(9):1680-6. doi: 10.1002/pmic.200300515.

7

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.自下而上蛋白质组学中缺失值问题的多重插补方法。

Int J Mol Sci. 2021 Sep 6;22(17):9650. doi: 10.3390/ijms22179650.

8

Detection of batch effects in liquid chromatography-mass spectrometry metabolomic data using guided principal component analysis.使用引导主成分分析检测液相色谱-质谱代谢组学数据中的批次效应

Talanta. 2014 Dec;130:442-8. doi: 10.1016/j.talanta.2014.07.031. Epub 2014 Jul 18.

9

Haystack, a web-based tool for metabolomics research.Haystack，一款用于代谢组学研究的基于网络的工具。

BMC Bioinformatics. 2014;15 Suppl 11(Suppl 11):S12. doi: 10.1186/1471-2105-15-S11-S12. Epub 2014 Oct 21.

10

[Effect of menthol cigarette on rats for metabonomics by liquid chromatography-mass spectrometry].[薄荷醇香烟对大鼠代谢组学的液相色谱-质谱效应]

Se Pu. 2010 Aug;28(8):765-8. doi: 10.3724/sp.j.1123.2010.00765.

引用本文的文献

1

Investigation of Influences on Indoor and Outdoor SVOC Exposure.室内和室外半挥发性有机化合物暴露影响的调查

Int J Environ Res Public Health. 2025 Apr 3;22(4):556. doi: 10.3390/ijerph22040556.

2

Deciphering ApoE Genotype-Driven Proteomic and Lipidomic Alterations in Alzheimer's Disease Across Distinct Brain Regions.解析载脂蛋白 E 基因型驱动的阿尔茨海默病不同脑区的蛋白质组学和脂质组学改变。

J Proteome Res. 2024 Aug 2;23(8):2970-2985. doi: 10.1021/acs.jproteome.3c00604. Epub 2024 Jan 18.

3

Expanding the access of wearable silicone wristbands in community-engaged research through best practices in data analysis and integration.通过数据分析和整合方面的最佳实践，扩大可穿戴硅胶手环在社区参与式研究中的应用。

Pac Symp Biocomput. 2024;29:170-186.

4

Expanding the access of wearable silicone wristbands in community-engaged research through best practices in data analysis and integration.通过数据分析与整合的最佳实践，扩大可穿戴硅胶腕带在社区参与研究中的应用范围。

bioRxiv. 2023 Oct 2:2023.09.29.560217. doi: 10.1101/2023.09.29.560217.

5

Proteogenomic and metabolomic characterization of human glioblastoma.人类脑胶质瘤的蛋白质基因组学和代谢组学特征分析。

Cancer Cell. 2021 Apr 12;39(4):509-528.e20. doi: 10.1016/j.ccell.2021.01.006. Epub 2021 Feb 11.

6

Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies.比较15岁血清和干血斑样本中已鉴定且具有统计学意义的脂质和极性代谢物，用于纵向研究。

Rapid Commun Mass Spectrom. 2017 Mar 15;31(5):447-456. doi: 10.1002/rcm.7808.

7

An integrative imputation method based on multi-omics datasets.一种基于多组学数据集的综合插补方法。

BMC Bioinformatics. 2016 Jun 21;17:247. doi: 10.1186/s12859-016-1122-6.

8

Effects of imputation on correlation: implications for analysis of mass spectrometry data from multiple biological matrices.插补对相关性的影响：对来自多种生物基质的质谱数据分析的启示

Brief Bioinform. 2017 Mar 1;18(2):312-320. doi: 10.1093/bib/bbw010.

9

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.基于质谱的无标记全局蛋白质组学中缺失值插补挑战的综述、评估与讨论。

J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.

10

A Statistical Analysis of the Effects of Urease Pre-treatment on the Measurement of the Urinary Metabolome by Gas Chromatography-Mass Spectrometry.脲酶预处理对气相色谱-质谱联用测定尿代谢组影响的统计分析

Metabolomics. 2014 Oct 1;10(5):897-908. doi: 10.1007/s11306-014-0642-1.

本文引用的文献

1

Diet-induced obesity reprograms the inflammatory response of the murine lung to inhaled endotoxin.饮食诱导的肥胖会重新编程小鼠肺部对吸入内毒素的炎症反应。

Toxicol Appl Pharmacol. 2013 Mar 1;267(2):137-48. doi: 10.1016/j.taap.2012.12.020. Epub 2013 Jan 7.

2

Using a spike-in experiment to evaluate analysis of LC-MS data.利用加标实验评估 LC-MS 数据分析。

Proteome Sci. 2012 Feb 27;10:13. doi: 10.1186/1477-5956-10-13.

3

Addressing the challenge of defining valid proteomic biomarkers and classifiers.解决定义有效蛋白质组生物标志物和分类器的挑战。

BMC Bioinformatics. 2010 Dec 10;11:594. doi: 10.1186/1471-2105-11-594.

4

Combined statistical analyses of peptide intensities and peptide occurrences improves identification of significant peptides from MS-based proteomics data.联合肽强度和肽出现的统计分析可提高基于 MS 的蛋白质组学数据中显著肽的鉴定。

J Proteome Res. 2010 Nov 5;9(11):5748-56. doi: 10.1021/pr1005247. Epub 2010 Oct 8.

5

A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics.用于准确质量和时间蛋白质组学的原型肽预测的支持向量机模型。

Bioinformatics. 2010 Jul 1;26(13):1677-83. doi: 10.1093/bioinformatics/btq251.

6

Urinary protein profiles in a rat model for diabetic complications.糖尿病并发症大鼠模型中的尿蛋白谱

Mol Cell Proteomics. 2009 Sep;8(9):2145-58. doi: 10.1074/mcp.M800558-MCP200. Epub 2009 Jun 4.

7

What is principal component analysis?什么是主成分分析？

Nat Biotechnol. 2008 Mar;26(3):303-4. doi: 10.1038/nbt0308-303.

8

An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry.一种通过串联质谱进行更灵敏、可靠的肽段鉴定的支持向量机评分器。

Pac Symp Biocomput. 2006:303-14.

9

Sequential projection pursuit using genetic algorithms for data mining of analytical data.使用遗传算法的序贯投影寻踪用于分析数据的数据挖掘

Anal Chem. 2000 Jul 1;72(13):2846-55. doi: 10.1021/ac0000123.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验