Simeonov V, Stratis J A, Samara C, Zachariadis G, Voutsa D, Anthemidis A, Sofoniou M, Kouimtzis Th
Faculty of Chemistry, University of Sofia "St. Kl. Okhridski", J. Bourchier Blvd. 1, 1164 Sofia, Bulgaria.
Water Res. 2003 Oct;37(17):4119-24. doi: 10.1016/S0043-1354(03)00398-1.
The application of different multivariate statistical approaches for the interpretation of a large and complex data matrix obtained during a monitoring program of surface waters in Northern Greece is presented in this study. The dataset consists of analytical results from a 3-yr survey conducted in the major river systems (Aliakmon, Axios, Gallikos, Loudias and Strymon) as well as streams, tributaries and ditches. Twenty-seven parameters have been monitored on 25 key sampling sites on monthly basis (total of 22,350 observations). The dataset was treated using cluster analysis (CA), principal component analysis and multiple regression analysis on principal components. CA showed four different groups of similarity between the sampling sites reflecting the different physicochemical characteristics and pollution levels of the studied water systems. Six latent factors were identified as responsible for the data structure explaining 90% of the total variance of the dataset and are conditionally named organic, nutrient, physicochemical, weathering, soil-leaching and toxic-anthropogenic factors. A multivariate receptor model was also applied for source apportionment estimating the contribution of identified sources to the concentration of the physicochemical parameters. This study presents the necessity and usefulness of multivariate statistical assessment of large and complex databases in order to get better information about the quality of surface water, the design of sampling and analytical protocols and the effective pollution control/management of the surface waters.
本研究介绍了不同多元统计方法在解释希腊北部地表水监测项目中获得的大型复杂数据矩阵时的应用。数据集包括在主要河流系统(阿利亚克蒙河、阿克西奥斯河、加利科斯河、卢迪亚斯河和斯特里蒙河)以及溪流、支流和沟渠中进行的为期3年的调查的分析结果。在25个关键采样点每月监测27个参数(总共22350次观测)。使用聚类分析(CA)、主成分分析和主成分多元回归分析对数据集进行处理。CA显示采样点之间存在四种不同的相似性组,反映了所研究水系统的不同理化特征和污染水平。确定了六个潜在因素对数据结构负责,解释了数据集总方差的90%,并被有条件地命名为有机、营养、理化、风化、土壤淋溶和有毒人为因素。还应用了多元受体模型进行源分配,估计已识别源对理化参数浓度的贡献。本研究提出了对大型复杂数据库进行多元统计评估的必要性和实用性,以便更好地了解地表水质量、采样和分析方案的设计以及地表水的有效污染控制/管理。