Suppr超能文献

主成分分析与模糊主成分分析:以多瑙河水质(1985 - 1996年)为例的案例研究

Principal component analysis versus fuzzy principal component analysis A case study: the quality of danube water (1985-1996).

作者信息

Sârbu C, Pop H F

机构信息

Department of Analytical Chemsitry, Faculty of Chemistry and Chemical Engineering, "Babeş-Bolyai" University, Arany Janos Str. 11, RO-400028 Cluj-Napoca, Romania.

出版信息

Talanta. 2005 Mar 15;65(5):1215-20. doi: 10.1016/j.talanta.2004.08.047.

Abstract

Principal component analysis (PCA) is a favorite tool in environmetrics for data compression and information extraction. PCA finds linear combinations of the original measurement variables that describe the significant variations in the data. However, it is well-known that PCA, as with any other multivariate statistical method, is sensitive to outliers, missing data, and poor linear correlation between variables due to poorly distributed variables. As a result data transformations have a large impact upon PCA. In this regard one of the most powerful approach to improve PCA appears to be the fuzzification of the matrix data, thus diminishing the influence of the outliers. In this paper we discuss and apply a robust fuzzy PCA algorithm (FPCA). The efficiency of the new algorithm is illustrated on a data set concerning the water quality of the Danube River for a period of 11 consecutive years. Considering, for example, a two component model, FPCA accounts for 91.7% of the total variance and PCA accounts only for 39.8%. Much more, PCA showed only a partial separation of the variables and no separation of scores (samples) onto the plane described by the first two principal components, whereas a much sharper differentiation of the variables and scores is observed when FPCA is applied.

摘要

主成分分析(PCA)是环境计量学中用于数据压缩和信息提取的常用工具。PCA通过寻找原始测量变量的线性组合来描述数据中的显著变化。然而,众所周知,与任何其他多元统计方法一样,PCA对异常值、缺失数据以及由于变量分布不佳导致的变量间线性相关性较差较为敏感。因此,数据变换对PCA有很大影响。在这方面,改进PCA的最有效方法之一似乎是对矩阵数据进行模糊化处理,从而减少异常值的影响。在本文中,我们讨论并应用了一种稳健的模糊主成分分析算法(FPCA)。通过一个连续11年的多瑙河水质数据集说明了新算法的有效性。例如,考虑一个双成分模型,FPCA解释了总方差的91.7%,而PCA仅解释了39.8%。此外,PCA在前两个主成分所描述的平面上仅显示出变量的部分分离,而得分(样本)没有分离,而应用FPCA时,变量和得分的差异则更为明显。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验