Scholz Matthias, Selbig Joachim
Institute of Biochemistry and Biology, University of Potsdam, Germany.
Methods Mol Biol. 2007;358:87-104. doi: 10.1007/978-1-59745-244-1_6.
This chapter provides an overview of visualization and analysis techniques applied to large-scale datasets from genomics, metabolomics, and proteomics. The aim is to reduce the number of variables (genes, metabolites, or proteins) by extracting a small set of new relevant variables, usually termed components. The advantages and disadvantages of the classical principal component analysis (PC A) are discussed and a link is given to the closely related singular value decomposition and multidimensional scaling. Special emphasis is given to the recent trend toward the use of independent component analysis, which aims to extract statistically independent components and, therefore, provides usually more meaningful components than PCA. We also discuss normalization techniques and their influence on the result of different analytical techniques.
本章概述了应用于基因组学、代谢组学和蛋白质组学大规模数据集的可视化和分析技术。目的是通过提取一小部分新的相关变量(通常称为成分)来减少变量(基因、代谢物或蛋白质)的数量。讨论了经典主成分分析(PCA)的优缺点,并给出了与密切相关的奇异值分解和多维缩放的联系。特别强调了使用独立成分分析的最新趋势,该分析旨在提取统计上独立的成分,因此通常比PCA提供更有意义的成分。我们还讨论了归一化技术及其对不同分析技术结果的影响。