生物学中的计数数据——数据转换还是模型重构？

Count data in biology-Data transformation or model reformation?

作者信息

St-Pierre Anne P, Shikon Violaine, Schneider David C

机构信息

Department of Ocean Sciences Ocean Sciences Centre Memorial University of Newfoundland St. John's NL Canada.

Department of Biology Memorial University of Newfoundland St. John's NL Canada.

出版信息

Ecol Evol. 2018 Feb 16;8(6):3077-3085. doi: 10.1002/ece3.3807. eCollection 2018 Mar.

DOI:10.1002/ece3.3807

PMID:29607007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5869353/

Abstract

Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for and tests. Over the years, there has been a movement from data transformation toward model reformation-the use of non-normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar -values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back-transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology.

摘要

统计分析是科学研究不可或缺的一部分，几十年来，生物学家一直对数据进行变换，以满足t检验和F检验的正态误差假设。多年来，出现了从数据变换向模型改革的转变，即在广义线性模型（GLM）框架内使用非正态误差结构。模型改革的主要优点是在原始尺度而非变换后的尺度上估计参数。然而，对于具有已知误差结构的模拟数据，数据变换已被证明能更好地控制I型错误。我们对面向生物学家的统计教科书以及发表在主流文献中的期刊文章进行了文献综述，以确定过去35年中文本推荐和经同行评审文献中的实践的时间趋势。在这项综述中，主流文献中模型改革使用增加的趋势很明显，从1996年前不使用模型改革到2006年后超过50%的被审查文章应用广义线性模型。然而，在统计教科书的推荐中未观察到这种趋势。然后，我们基于已发表的数据集进行了12项分析，在这些分析中，我们比较了使用平方根变换、对数变换和广义线性模型进行分析所得到的I型错误估计、残差图诊断和系数。所有分析都产生了可接受的残差与拟合图，并且在每次分析中具有相似的P值，但正如预期的那样，系数估计有很大差异。此外，在文献中找不到关于对从变换后数据集上执行的线性模型获得的系数估计进行反变换的程序的共识。系数估计之间缺乏一致性构成了生物学中模型改革优于数据变换的一个主要论据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da60/5869353/745456f7a4e1/ECE3-8-3077-g001.jpg

相似文献

Count data in biology-Data transformation or model reformation?

Ecol Evol. 2018 Feb 16;8(6):3077-3085. doi: 10.1002/ece3.3807. eCollection 2018 Mar.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Evaluating the double Poisson generalized linear model.

Accid Anal Prev. 2013 Oct;59:497-505. doi: 10.1016/j.aap.2013.07.017. Epub 2013 Jul 21.

Ecotoxicology is not normal: A comparison of statistical approaches for analysis of count and proportion data in ecotoxicology.

Environ Sci Pollut Res Int. 2015 Sep;22(18):13990-9. doi: 10.1007/s11356-015-4579-3. Epub 2015 May 9.

Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms II: A Method to Obtain First-Level Analysis Residuals with Uniform and Gaussian Spatial Autocorrelation Function and Independent and Identically Distributed Time-Series.

Brain Connect. 2018 Feb;8(1):10-21. doi: 10.1089/brain.2017.0522.

[Meta-analysis of the Italian studies on short-term effects of air pollution].

Epidemiol Prev. 2001 Mar-Apr;25(2 Suppl):1-71.

Transformations of count data for tests of interaction in factorial and split-plot experiments.

J Econ Entomol. 2006 Jun;99(3):1002-6. doi: 10.1603/0022-0493-99.3.1002.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

Poisson Counts, Square Root Transformation and Small Area Estimation: Square Root Transformation.

Sankhya B (2008). 2022;84(2):449-471. doi: 10.1007/s13571-021-00269-8. Epub 2021 Oct 11.

Impact of the 1990 Hong Kong legislation for restriction on sulfur content in fuel.

Res Rep Health Eff Inst. 2012 Aug(170):5-91.

引用本文的文献

Generalized linear modeling of flow cytometry data to analyze immune responses in tuberculosis vaccine research.

NPJ Syst Biol Appl. 2025 Aug 10;11(1):90. doi: 10.1038/s41540-025-00572-4.

Closing the multichannel gap through computational reconstruction of interaction in super-resolution microscopy.

Patterns (N Y). 2025 Mar 27;6(5):101181. doi: 10.1016/j.patter.2025.101181. eCollection 2025 May 9.

Statistical data transformation in agrarian sciences for variance analysis: a systematic review.

F1000Res. 2024 Jul 12;13:459. doi: 10.12688/f1000research.144805.2. eCollection 2024.

A Novel One-Sample Mendelian Randomization Approach for Count-Type Outcomes That Is Robust to Correlated and Uncorrelated Pleiotropic Effects.

Genet Epidemiol. 2025 Jan;49(1):e22602. doi: 10.1002/gepi.22602. Epub 2024 Nov 5.

Genetic and phenotypic parameters for sexual precocity and parasite resistance traits in Nellore cattle.

J Appl Genet. 2023 Dec;64(4):797-807. doi: 10.1007/s13353-023-00781-9. Epub 2023 Sep 8.

Automated quantification and statistical assessment of proliferating cardiomyocyte rates in embryonic hearts.

Am J Physiol Heart Circ Physiol. 2023 Mar 1;324(3):H288-H292. doi: 10.1152/ajpheart.00483.2022. Epub 2022 Dec 23.

Compositional Dynamics of Gastrointestinal Tract Microbiomes Associated with Dietary Transition and Feeding Cessation in Lake Sturgeon Larvae.

Microorganisms. 2022 Sep 19;10(9):1872. doi: 10.3390/microorganisms10091872.

Dataset for effects of the transition from dry forest to pasture on diversity and structure of bacterial communities in Northeastern Brazil.

Data Brief. 2022 Jan 19;41:107842. doi: 10.1016/j.dib.2022.107842. eCollection 2022 Apr.

Adequate statistical modelling and data selection are essential when analysing abundance and diversity trends.

Nat Ecol Evol. 2021 May;5(5):592-594. doi: 10.1038/s41559-021-01427-x. Epub 2021 Apr 5.

Functional Redundancy in bird community decreases with riparian forest width reduction.

Ecol Evol. 2018 Oct 11;8(21):10395-10408. doi: 10.1002/ece3.4448. eCollection 2018 Nov.

本文引用的文献

To transform or not to transform: using generalized linear mixed models to analyse reaction time data.

Front Psychol. 2015 Aug 7;6:1171. doi: 10.3389/fpsyg.2015.01171. eCollection 2015.

Acute effects of removing large fish from a near-pristine coral reef.

Mar Biol. 2010;157(12):2739-2750. doi: 10.1007/s00227-010-1533-2. Epub 2010 Aug 26.

Sociality, density-dependence and microclimates determine the persistence of populations suffering from a novel fungal disease, white-nose syndrome.

Ecol Lett. 2012 Sep;15(9):1050-7. doi: 10.1111/j.1461-0248.2012.01829.x. Epub 2012 Jul 2.

The arcsine is asinine: the analysis of proportions in ecology.

Ecology. 2011 Jan;92(1):3-10. doi: 10.1890/10-0340.1.

The use of transformations.

Biometrics. 1947 Mar;3(1):39-52.

Some consequences when the assumptions for the analysis of variance are not satisfied.

Biometrics. 1947 Mar;3(1):22-38.

The assumptions underlying the analysis of variance.

Biometrics. 1947 Mar;3(1):1-21.

Generalized linear mixed models: a practical guide for ecology and evolution.

Trends Ecol Evol. 2009 Mar;24(3):127-35. doi: 10.1016/j.tree.2008.10.008.

Model selection and logarithmic transformation in allometric analysis.

Physiol Biochem Zool. 2008 Jul-Aug;81(4):496-507. doi: 10.1086/589110.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物学中的计数数据——数据转换还是模型重构？

Count data in biology-Data transformation or model reformation?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献