使用主成分分析对基于大样本微阵列的基因表达谱进行划分。

Partitioning large-sample microarray-based gene expression profiles using principal components analysis.

作者信息

Peterson Leif E

机构信息

Department of Medicine, Baylor College of Medicine, One Baylor Plaza ST-924, Houston, TX 77030, USA.

出版信息

Comput Methods Programs Biomed. 2003 Feb;70(2):107-19. doi: 10.1016/s0169-2607(02)00009-3.

DOI:10.1016/s0169-2607(02)00009-3

PMID:12507787

Abstract

Principal components analysis (PCA) is useful for reproducing the total variation among hundreds or thousands of continuously-scaled variables with a much smaller number of unobservable variables called 'latent factors'. The CLUSFAVOR computer program was used to implement PCA for identifying groups of genes with similar expression profiles from a large number of genes used on DNA microarrays. This paper describes the principal components solution to the factor model of the correlation matrix R, calculation of eigenvalues and eigenvectors of R, extraction of factors, and calculation of factor loadings and identification of genes with similar loading patterns to construct groups of genes with similar expression profiles. With regard to extraction of factors, it was found that more than 90% of the total variance in input data could be accounted for by extracting factors whose eigenvalues exceed unity. Bipolar factors containing strong positive and negative loadings can also be used for identifying two unique groups of genes, since expression profiles of genes that load positive are unlike expression profiles of genes that load negative on the same factor. While PCA does not provide the absolute answer to a multidimensional problem, it nevertheless can provide a heuristic with which natural groupings of genes with similar expression profiles can be assembled. While cluster analysis essentially generates a single dendogram (tree branch) containing every gene in the input data, PCA can be used to assemble gene expression profiles that strongly correlate with the latent factors accounting for a majority of total variance. Example results for CLUSFAVOR computer program runs are provided.

摘要

主成分分析（PCA）有助于用数量少得多的不可观测变量（称为“潜在因子”）来重现数百或数千个连续尺度变量之间的总变异。CLUSFAVOR计算机程序用于实施主成分分析，以便从DNA微阵列上使用的大量基因中识别出具有相似表达谱的基因组。本文描述了相关矩阵R的因子模型的主成分解、R的特征值和特征向量的计算、因子提取、因子载荷的计算以及具有相似载荷模式的基因的识别，以构建具有相似表达谱的基因组。关于因子提取，发现通过提取特征值超过1的因子，可以解释输入数据中超过90%的总方差。包含强正载荷和负载荷的双极因子也可用于识别两个独特的基因组，因为在同一因子上载荷为正的基因的表达谱与载荷为负的基因的表达谱不同。虽然主成分分析不能为多维问题提供绝对答案，但它仍然可以提供一种启发式方法，通过该方法可以组装具有相似表达谱的基因自然分组。虽然聚类分析本质上生成一个包含输入数据中每个基因的单一树状图（树枝），但主成分分析可用于组装与占总方差大部分的潜在因子高度相关的基因表达谱。提供了CLUSFAVOR计算机程序运行的示例结果。

相似文献

Partitioning large-sample microarray-based gene expression profiles using principal components analysis.

Comput Methods Programs Biomed. 2003 Feb;70(2):107-19. doi: 10.1016/s0169-2607(02)00009-3.

Factor analysis of cluster-specific gene expression levels from cDNA microarrays.

Comput Methods Programs Biomed. 2002 Nov;69(3):179-88. doi: 10.1016/s0169-2607(01)00189-4.

CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles.

Genome Biol. 2002 Jun 24;3(7):SOFTWARE0002. doi: 10.1186/gb-2002-3-7-software0002.

Identifying temporally differentially expressed genes through functional principal components analysis.

Biostatistics. 2009 Oct;10(4):667-79. doi: 10.1093/biostatistics/kxp022. Epub 2009 Jul 14.

Vector algebra in the analysis of genome-wide expression data.

Genome Biol. 2002;3(3):RESEARCH0011. doi: 10.1186/gb-2002-3-3-research0011. Epub 2002 Feb 13.

Gene function inference from gene expression of deletion mutants.

Methods Mol Biol. 2007;408:1-18. doi: 10.1007/978-1-59745-547-3_1.

Application of independent component analysis to microarrays.

Genome Biol. 2003;4(11):R76. doi: 10.1186/gb-2003-4-11-r76. Epub 2003 Oct 24.

Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling.

Bioinformatics. 2002 Feb;18(2):287-97. doi: 10.1093/bioinformatics/18.2.287.

Statistical significance of variables driving systematic variation in high-dimensional data.

Bioinformatics. 2015 Feb 15;31(4):545-54. doi: 10.1093/bioinformatics/btu674. Epub 2014 Oct 21.

Toxicogenomics using yeast DNA microarrays.

J Biosci Bioeng. 2010 Nov;110(5):511-22. doi: 10.1016/j.jbiosc.2010.06.003. Epub 2010 Jul 10.

引用本文的文献

Genetic discovery and risk characterization in type 2 diabetes across diverse populations.

HGG Adv. 2021 Apr 8;2(2). doi: 10.1016/j.xhgg.2021.100029. Epub 2021 Mar 9.

Mass Spectrometry-Based Comprehensive Analysis of Pancreatic Cyst Fluids.

Biomed Res Int. 2018 Nov 29;2018:7169595. doi: 10.1155/2018/7169595. eCollection 2018.

Discovery, fine-mapping, and conditional analyses of genetic variants associated with C-reactive protein in multiethnic populations using the Metabochip in the Population Architecture using Genomics and Epidemiology (PAGE) study.

Hum Mol Genet. 2018 Aug 15;27(16):2940-2953. doi: 10.1093/hmg/ddy211.

Transethnic insight into the genetics of glycaemic traits: fine-mapping results from the Population Architecture using Genomics and Epidemiology (PAGE) consortium.

Diabetologia. 2017 Dec;60(12):2384-2398. doi: 10.1007/s00125-017-4405-1. Epub 2017 Sep 13.

A Transcriptomic Signature of Mouse Liver Progenitor Cells.

Stem Cells Int. 2016;2016:5702873. doi: 10.1155/2016/5702873. Epub 2016 Oct 3.

Variant Discovery and Fine Mapping of Genetic Loci Associated with Blood Pressure Traits in Hispanics and African Americans.

PLoS One. 2016 Oct 13;11(10):e0164132. doi: 10.1371/journal.pone.0164132. eCollection 2016.

Immune cell subsets and their gene expression profiles from human PBMC isolated by Vacutainer Cell Preparation Tube (CPT™) and standard density gradient.

BMC Immunol. 2015 Aug 26;16:48. doi: 10.1186/s12865-015-0113-0.

Prospective associations of coronary heart disease loci in African Americans using the MetaboChip: the PAGE study.

PLoS One. 2014 Dec 26;9(12):e113203. doi: 10.1371/journal.pone.0113203. eCollection 2014.

The discrimination of interaural level difference sensitivity functions: development of a taxonomic data template for modelling.

BMC Neurosci. 2013 Oct 7;14:114. doi: 10.1186/1471-2202-14-114.

Transcriptomic profiling of human peritumoral neocortex tissues revealed genes possibly involved in tumor-induced epilepsy.

PLoS One. 2013;8(2):e56077. doi: 10.1371/journal.pone.0056077. Epub 2013 Feb 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用主成分分析对基于大样本微阵列的基因表达谱进行划分。

Partitioning large-sample microarray-based gene expression profiles using principal components analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献