Institute for Environmental Sciences, University of Koblenz-Landau, Fortstraße 7, 76829, Landau, Germany,
Environ Sci Pollut Res Int. 2015 Sep;22(18):13990-9. doi: 10.1007/s11356-015-4579-3. Epub 2015 May 9.
Ecotoxicologists often encounter count and proportion data that are rarely normally distributed. To meet the assumptions of the linear model, such data are usually transformed or non-parametric methods are used if the transformed data still violate the assumptions. Generalized linear models (GLMs) allow to directly model such data, without the need for transformation. Here, we compare the performance of two parametric methods, i.e., (1) the linear model (assuming normality of transformed data), (2) GLMs (assuming a Poisson, negative binomial, or binomially distributed response), and (3) non-parametric methods. We simulated typical data mimicking low replicated ecotoxicological experiments of two common data types (counts and proportions from counts). We compared the performance of the different methods in terms of statistical power and Type I error for detecting a general treatment effect and determining the lowest observed effect concentration (LOEC). In addition, we outlined differences on a real-world mesocosm data set. For count data, we found that the quasi-Poisson model yielded the highest power. The negative binomial GLM resulted in increased Type I errors, which could be fixed using the parametric bootstrap. For proportions, binomial GLMs performed better than the linear model, except to determine LOEC at extremely low sample sizes. The compared non-parametric methods had generally lower power. We recommend that counts in one-factorial experiments should be analyzed using quasi-Poisson models and proportions from counts by binomial GLMs. These methods should become standard in ecotoxicology.
生态毒理学家经常遇到很少呈正态分布的计数和比例数据。为了满足线性模型的假设,如果转换后的数据仍然违反假设,则通常会对这些数据进行转换或使用非参数方法。广义线性模型(GLM)允许直接对这些数据进行建模,而无需进行转换。在这里,我们比较了两种参数方法的性能,即(1)线性模型(假设转换后数据的正态性),(2)GLM(假设泊松分布、负二项分布或二项分布的响应),和(3)非参数方法。我们模拟了两种常见数据类型(计数的计数和比例)的低重复生态毒理学实验的典型数据,以比较不同方法在检测一般处理效果和确定最低观察到的效应浓度(LOEC)方面的统计功效和 I 型错误。此外,我们还概述了真实世界中中观数据集中的差异。对于计数数据,我们发现拟泊松模型产生的功效最高。负二项式 GLM 导致 I 型错误增加,可以使用参数自举法解决。对于比例,二项式 GLM 比线性模型的性能更好,除非在极低的样本量下确定 LOEC。比较的非参数方法的功效通常较低。我们建议在单因素实验中对计数数据使用拟泊松模型进行分析,对计数的比例使用二项式 GLM 进行分析。这些方法应该成为生态毒理学的标准。