Jansen Jeroen J, Hoefsloot Huub C J, Boelens Hans F M, van der Greef Jan, Smilde Age K
Biosystems Data Analysis, Faculty of Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands.
Bioinformatics. 2004 Oct 12;20(15):2438-46. doi: 10.1093/bioinformatics/bth268. Epub 2004 Apr 15.
Metabolomics datasets are generally large and complex. Using principal component analysis (PCA), a simplified view of the variation in the data is obtained. The PCA model can be interpreted and the processes underlying the variation in the data can be analysed. In metabolomics, often a priori information is present about the data. Various forms of this information can be used in an unsupervised data analysis with weighted PCA (WPCA). A WPCA model will give a view on the data that is different from the view obtained using PCA, and it will add to the interpretation of the information in a metabolomics dataset.
A method is presented to translate spectra of repeated measurements into weights describing the experimental error. These weights are used in the data analysis with WPCA. The WPCA model will give a view on the data where the non-uniform experimental error is accounted for. Therefore, the WPCA model will focus more on the natural variation in the data.
M-files for MATLAB for the algorithm used in this research are available at http://www-its.chem.uva.nl/research/pac/Software/pcaw.zip.
代谢组学数据集通常规模庞大且复杂。使用主成分分析(PCA)可获得数据变化的简化视图。PCA模型能够得到解释,并且可以分析数据变化背后的过程。在代谢组学中,通常存在关于数据的先验信息。这种信息的各种形式可用于加权主成分分析(WPCA)的无监督数据分析。WPCA模型将呈现出与使用PCA获得的视图不同的数据视图,并且会有助于对代谢组学数据集中信息的解释。
提出了一种将重复测量的光谱转换为描述实验误差的权重的方法。这些权重用于WPCA的数据分析。WPCA模型将给出考虑了非均匀实验误差的数据视图。因此,WPCA模型将更多地关注数据中的自然变化。
本研究中使用的算法的MATLAB的M文件可在http://www-its.chem.uva.nl/research/pac/Software/pcaw.zip获取。