Geological Survey of Norway, PO Box 6315 Sluppen, N-7491 Trondheim, Norway.
Sci Total Environ. 2012 Jun 1;426:196-210. doi: 10.1016/j.scitotenv.2012.02.032. Epub 2012 Apr 12.
Applied geochemistry and environmental sciences invariably deal with compositional data. Classically, the original or log-transformed absolute element concentrations are studied. However, compositional data do not vary independently, and a concentration based approach to data analysis can lead to faulty conclusions. For this reason a better statistical approach was introduced in the 1980s, exclusively based on relative information. Because the difference between the two methods should be most pronounced in large-scale, and therefore highly variable, datasets, here a new dataset of agricultural soils, covering all of Europe (5.6 million km(2)) at an average sampling density of 1 site/2500 km(2), is used to demonstrate and compare both approaches. Absolute element concentrations are certainly of interest in a variety of applications and can be provided in tabulations or concentration maps. Maps for the opened data (ratios to other elements) provide more specific additional information. For compositional data XY plots for raw or log-transformed data should only be used with care in an exploratory data analysis (EDA) sense, to detect unusual data behaviour, candidate subgroups of samples, or to compare pre-defined groups of samples. Correlation analysis and the Euclidean distance are not mathematically meaningful concepts for this data type. Element relationships have to be investigated via a stability measure of the (log-)ratios of elements. Logratios are also the key ingredient for an appropriate multivariate analysis of compositional data.
应用地球化学和环境科学通常涉及成分数据。经典地,研究原始或对数变换的绝对元素浓度。然而,成分数据并非独立变化,基于浓度的数据分析方法可能会导致错误的结论。因此,在上世纪 80 年代引入了一种更好的统计方法,完全基于相对信息。由于这两种方法之间的差异在大规模、高度可变的数据集上应该最为明显,因此,这里使用一个新的农业土壤数据集来演示和比较这两种方法,该数据集覆盖了整个欧洲(560 万平方公里),平均采样密度为每 2500 平方公里一个采样点。绝对元素浓度在各种应用中肯定是感兴趣的,可以在表格或浓度图中提供。开放数据的地图(相对于其他元素的比值)提供了更具体的附加信息。对于成分数据,原始或对数变换数据的 XY 图仅应在探索性数据分析(EDA)意义上谨慎使用,以检测异常数据行为、样本候选分组,或比较预定义的样本分组。对于这种数据类型,相关分析和欧几里得距离不是有意义的数学概念。元素关系必须通过元素(对数)比值的稳定性度量来研究。对数比也是成分数据适当多元分析的关键要素。