Área de Ecología, Departamento de Biología Celular y Ecología, Escuela Politécnica Superior, Universidad de Santiago de Compostela, 27002 Lugo, Spain.
Environ Pollut. 2011 Oct;159(10):2797-800. doi: 10.1016/j.envpol.2011.05.006. Epub 2011 Jun 8.
Multivariate analysis of environmental data sets requires the absence of missing values or their substitution by small values. However, if the data is transformed logarithmically prior to the analysis, this solution cannot be applied because the logarithm of a small value might become an outlier. Several methods for substituting the missing values can be found in the literature although none of them guarantees that no distortion of the structure of the data set is produced. We propose a method for the assessment of these distortions which can be used for deciding whether to retain or not the samples or variables containing missing values and for the investigation of the performance of different substitution techniques. The method analyzes the structure of the distances among samples using Mantel tests. We present an application of the method to PCDD/F data measured in samples of terrestrial moss as part of a biomonitoring study.
多元环境数据集分析要求不存在缺失值或用较小的值来替代缺失值。然而,如果在分析前对数据进行对数转换,则不能采用这种方法,因为小值的对数可能会成为异常值。尽管文献中提供了几种替代缺失值的方法,但没有一种方法能保证数据集的结构不会发生扭曲。我们提出了一种评估这些扭曲的方法,可用于决定是否保留或舍弃含有缺失值的样本或变量,以及研究不同替代技术的性能。该方法使用 Mantel 检验分析样本之间距离的结构。我们将该方法应用于陆地苔藓样本中测量的多氯二苯并对二恶英/呋喃数据,作为生物监测研究的一部分。