St-Pierre Anne P, Shikon Violaine, Schneider David C
Department of Ocean Sciences Ocean Sciences Centre Memorial University of Newfoundland St. John's NL Canada.
Department of Biology Memorial University of Newfoundland St. John's NL Canada.
Ecol Evol. 2018 Feb 16;8(6):3077-3085. doi: 10.1002/ece3.3807. eCollection 2018 Mar.
Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for and tests. Over the years, there has been a movement from data transformation toward model reformation-the use of non-normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar -values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back-transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology.
统计分析是科学研究不可或缺的一部分,几十年来,生物学家一直对数据进行变换,以满足t检验和F检验的正态误差假设。多年来,出现了从数据变换向模型改革的转变,即在广义线性模型(GLM)框架内使用非正态误差结构。模型改革的主要优点是在原始尺度而非变换后的尺度上估计参数。然而,对于具有已知误差结构的模拟数据,数据变换已被证明能更好地控制I型错误。我们对面向生物学家的统计教科书以及发表在主流文献中的期刊文章进行了文献综述,以确定过去35年中文本推荐和经同行评审文献中的实践的时间趋势。在这项综述中,主流文献中模型改革使用增加的趋势很明显,从1996年前不使用模型改革到2006年后超过50%的被审查文章应用广义线性模型。然而,在统计教科书的推荐中未观察到这种趋势。然后,我们基于已发表的数据集进行了12项分析,在这些分析中,我们比较了使用平方根变换、对数变换和广义线性模型进行分析所得到的I型错误估计、残差图诊断和系数。所有分析都产生了可接受的残差与拟合图,并且在每次分析中具有相似的P值,但正如预期的那样,系数估计有很大差异。此外,在文献中找不到关于对从变换后数据集上执行的线性模型获得的系数估计进行反变换的程序的共识。系数估计之间缺乏一致性构成了生物学中模型改革优于数据变换的一个主要论据。