Suppr超能文献

微阵列数据的排列验证主成分分析

Permutation-validated principal components analysis of microarray data.

作者信息

Landgrebe Jobst, Wurst Wolfgang, Welzl Gerhard

机构信息

Institute of Biomathematics and Biometry, GSF-National Research Center for Environment and Health, Ingolstädter Landstrasse 1, D-85764 Neuherberg, Germany.

出版信息

Genome Biol. 2002;3(4):RESEARCH0019. doi: 10.1186/gb-2002-3-4-research0019. Epub 2002 Mar 22.

Abstract

BACKGROUND

In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.

RESULTS

We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes.

CONCLUSIONS

Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

摘要

背景

在微阵列数据分析中,比较不同条件下的基因表达谱以及选择具有生物学意义的基因是至关重要的任务。多变量统计方法已被应用于分析这些大型数据集。关于评估基因选择程序可靠性的研究较少。在此,我们描述一种使用置换验证主成分分析(PCA)评估多变量微阵列数据分析中可靠性的方法。该方法专为具有组结构的微阵列数据设计。

结果

我们使用PCA来检测杂交条件背后的主要方差来源,随后基于PCA衍生和基于置换的检验统计量进行基因选择。我们将该方法应用于特征明确的酵母细胞周期数据以及我们实验室的两个数据集来验证我们的方法。我们能够描述主要方差来源,选择信息丰富的基因并可视化基因与阵列之间的关系。我们观察到所解释方差水平和所选基因可解释性方面的差异。

结论

结合数据可视化和基于置换的基因选择,置换验证PCA能够说明几种条件之间的基因表达方差,并通过考虑基因的组间方差与组内方差之间的关系来选择基因。该方法可用于从微阵列数据中提取主要方差来源,可视化基因与杂交之间的关系,并以统计可靠的方式选择信息丰富的基因。这种选择考虑了重复样本或组结构的可重复性水平以及基因特异性离散度。数据可视化可以支持直接的生物学解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8e5/115254/5bd9c474c389/gb-2002-3-4-research0019-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验