设计实验的主成分分析

Principal component analysis for designed experiments.

作者信息

Konishi Tomokazu

出版信息

BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S7. doi: 10.1186/1471-2105-16-S18-S7. Epub 2015 Dec 9.

DOI:10.1186/1471-2105-16-S18-S7

PMID:26678818

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4682404/

Abstract

BACKGROUND

Principal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes. Although this methodology is frequently used in multivariate analyses, it has disadvantages when applied to experimental data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the data set. Second, the method is sensitive to experimental noise and bias between sample groups. It cannot reflect the experimental design that is planned to manage the noise and bias; rather, it estimates the same weight and independence to all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. First, the principal axes were identified using training data sets and shared across experiments. These training data reflect the design of experiments, and their preparation allows noise to be reduced and group bias to be removed. Second, the center of the rotation was determined in accordance with the experimental design. Third, the resulting components were scaled to unify their size unit.

RESULTS

The effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. The range of scaled scores was unaffected by the number of items. Additionally, unknown samples were appropriately classified using pre-arranged axes. Furthermore, these axes well reflected the characteristics of groups in the experiments. As was observed, the scaling of the components and sharing of axes enabled comparisons of the components beyond experiments. The use of training data reduced the effects of noise and bias in the data, facilitating the physical interpretation of the principal axes.

CONCLUSIONS

Together, these introduced options result in improved generality and objectivity of the analytical results. The methodology has thus become more like a set of multiple regression analyses that find independent models that specify each of the axes.

摘要

背景

主成分分析用于通过将矩阵拟合到正交轴，把转录组、蛋白质组、代谢组或医学检查中发现的矩阵数据归纳为较少的维度。尽管这种方法在多变量分析中经常使用，但应用于实验数据时存在缺点。首先，所确定的主成分普遍性较差；由于成分的大小和方向取决于特定数据集，这些成分仅在数据集中有效。其次，该方法对实验噪声和样本组之间的偏差敏感。它无法反映为管理噪声和偏差而设计的实验设计；相反，它对矩阵中的所有样本估计相同的权重和独立性。第三，所得成分往往难以解释。为了解决这些问题，该方法引入了几种选择。首先，使用训练数据集确定主轴并在不同实验中共享。这些训练数据反映了实验设计，其准备工作可减少噪声并消除组间偏差。其次，根据实验设计确定旋转中心。第三，对所得成分进行缩放以统一其大小单位。

结果

在微阵列实验中观察到了这些选择的效果，结果显示组间分离得到改善且对噪声具有鲁棒性。缩放分数的范围不受项目数量的影响。此外，使用预先安排的轴对未知样本进行了适当分类。此外，这些轴很好地反映了实验中各组的特征。如观察到的，成分的缩放和轴的共享使得能够在不同实验之间比较成分。使用训练数据减少了数据中噪声和偏差的影响，便于对主轴进行物理解释。

结论

总之，这些引入的选择提高了分析结果的普遍性和客观性。因此，该方法变得更类似于一组多元回归分析，这些分析找到指定每个轴的独立模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52c2/4682404/6ac0f8fb84ff/1471-2105-16-S18-S7-1.jpg

相似文献

Principal component analysis for designed experiments.

BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S7. doi: 10.1186/1471-2105-16-S18-S7. Epub 2015 Dec 9.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

A biological question and a balanced (orthogonal) design: the ingredients to efficiently analyze two-color microarrays with Confirmatory Factor Analysis.

BMC Genomics. 2006 Sep 12;7:232. doi: 10.1186/1471-2164-7-232.

Common functional principal components analysis: a new approach to analyzing human movement data.

Hum Mov Sci. 2011 Dec;30(6):1144-66. doi: 10.1016/j.humov.2010.11.005. Epub 2011 May 2.

Matrix factorization algorithms for the identification of muscle synergies: evaluation on simulated and experimental data sets.

J Neurophysiol. 2006 Apr;95(4):2199-212. doi: 10.1152/jn.00222.2005. Epub 2006 Jan 4.

Be careful with your principal components.

Evolution. 2019 Oct;73(10):2151-2158. doi: 10.1111/evo.13835. Epub 2019 Sep 2.

Diffusion MRI noise mapping using random matrix theory.

Magn Reson Med. 2016 Nov;76(5):1582-1593. doi: 10.1002/mrm.26059. Epub 2015 Nov 24.

Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data.

Anal Chem. 2010 Jul 1;82(13):5541-51. doi: 10.1021/ac100413t.

Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction.

Multivariate Behav Res. 2012 Jun 18;47(3):463-92. doi: 10.1080/00273171.2012.673952.

Multivariate analysis of neuronal interactions in the generalized partial least squares framework: simulations and empirical studies.

Neuroimage. 2003 Oct;20(2):625-42. doi: 10.1016/S1053-8119(03)00333-1.

引用本文的文献

- co-culture: An investigation of bioagents for controlling -induced basal rot in onion.

AIMS Microbiol. 2024 Nov 19;10(4):1024-1051. doi: 10.3934/microbiol.2024044. eCollection 2024.

Dietary Patterns of Pregnant Women and Their Association with Diet Quality Measures: A Comparative Analysis.

Nutrients. 2024 Jun 1;16(11):1736. doi: 10.3390/nu16111736.

Transcriptome variations in hybrids of wild emmer wheat (Triticum turgidum ssp. dicoccoides).

BMC Plant Biol. 2024 Jun 18;24(1):571. doi: 10.1186/s12870-024-05258-3.

Mass Spectrometry-Imaging Analysis of Active Ingredients in the Leaves of .

ACS Omega. 2024 Apr 10;9(16):18634-18642. doi: 10.1021/acsomega.4c01440. eCollection 2024 Apr 23.

Population structure and migration in the Eastern Highlands of Papua New Guinea, a region impacted by the kuru epidemic.

Am J Hum Genet. 2024 Apr 4;111(4):668-679. doi: 10.1016/j.ajhg.2024.02.011. Epub 2024 Mar 19.

Distinctive features of lipoprotein profiles in stroke patients.

PLoS One. 2023 Apr 5;18(4):e0283855. doi: 10.1371/journal.pone.0283855. eCollection 2023.

Theanine, a Tea-Leaf-Specific Amino Acid, Alleviates Stress through Modulation of Npas4 Expression in Group-Housed Older Mice.

Int J Mol Sci. 2023 Feb 16;24(4):3983. doi: 10.3390/ijms24043983.

Development and analysis of a comprehensive diagnostic model for aortic valve calcification using machine learning methods and artificial neural networks.

Front Cardiovasc Med. 2022 Dec 1;9:913776. doi: 10.3389/fcvm.2022.913776. eCollection 2022.

Excessive Sodium Intake Leads to Cardiovascular Disease by Promoting Sex-Specific Dysfunction of Murine Heart.

Front Nutr. 2022 Jul 1;9:830738. doi: 10.3389/fnut.2022.830738. eCollection 2022.

Mutations in SARS-CoV-2 are on the increase against the acquired immunity.

PLoS One. 2022 Jul 11;17(7):e0271305. doi: 10.1371/journal.pone.0271305. eCollection 2022.

本文引用的文献

Coincidence between transcriptome analyses on different microarray platforms using a parametric framework.

PLoS One. 2008;3(10):e3555. doi: 10.1371/journal.pone.0003555. Epub 2008 Oct 29.

NCBI GEO: archive for high-throughput functional genomic data.

Nucleic Acids Res. 2009 Jan;37(Database issue):D885-90. doi: 10.1093/nar/gkn764. Epub 2008 Oct 21.

What is principal component analysis?

Nat Biotechnol. 2008 Mar;26(3):303-4. doi: 10.1038/nbt0308-303.

Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation.

Toxicol Sci. 2008 Apr;102(2):444-54. doi: 10.1093/toxsci/kfn001. Epub 2008 Jan 4.

Genomic signal processing: from matrix algebra to genetic networks.

Methods Mol Biol. 2007;377:17-60. doi: 10.1007/978-1-59745-390-5_2.

Key stages in mammary gland development. Secretory activation in the mammary gland: it's not just about milk protein synthesis!

Breast Cancer Res. 2007;9(1):204. doi: 10.1186/bcr1653.

Interpretation of ANOVA models for microarray data using PCA.

Bioinformatics. 2007 Jan 15;23(2):184-90. doi: 10.1093/bioinformatics/btl572. Epub 2006 Nov 14.

Robust singular value decomposition analysis of microarray data.

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13167-72. doi: 10.1073/pnas.1733249100. Epub 2003 Oct 27.

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms.

Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3351-6. doi: 10.1073/pnas.0530258100. Epub 2003 Mar 11.

Principal component analysis for clustering gene expression data.

Bioinformatics. 2001 Sep;17(9):763-74. doi: 10.1093/bioinformatics/17.9.763.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

设计实验的主成分分析

Principal component analysis for designed experiments.

作者信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献