Suppr超能文献

分类还是不分类:为何寄生虫丰度数据不应归为类别进行统计分析。

To bin or not to bin: why parasite abundance data should not be lumped into categories for statistical analysis.

作者信息

Poulin Robert

机构信息

Department of Zoology, University of Otago, Dunedin, New Zealand.

出版信息

Parasitology. 2025 Mar;152(3):338-345. doi: 10.1017/S003118202500040X.

Abstract

The impact of macroparasites on their hosts is proportional to the number of parasites per host, or parasite abundance. Abundance values are count data, i.e. integers ranging from 0 to some maximum number, depending on the host-parasite system. When using parasite abundance as a predictor in statistical analysis, a common approach is to bin values, i.e. group hosts into infection categories based on abundance, and test for differences in some response variable (e.g. a host trait) among these categories. There are well-documented pitfalls associated with this approach. Here, I use a literature review to show that binning abundance values for analysis has been used in one-third of studies published in parasitological journals over the past 15 years, and half of the studies in ecological and behavioural journals, often without any justification. Binning abundance data into arbitrary categories has been much more common among studies using experimental infections than among those using naturally infected hosts. I then use simulated data to demonstrate that true and significant relationships between parasite abundance and host traits can be missed when abundance values are binned for analysis, and vice versa that when there is no underlying relationship between abundance and host traits, analysis of binned data can create a spurious one. This holds regardless of the prevalence of infection or the level of parasite aggregation in a host sample. These findings argue strongly for the practice of binning abundance data as a predictor variable to be abandoned in favour of more appropriate analytical approaches.

摘要

大型寄生虫对其宿主的影响与每个宿主的寄生虫数量成正比,即寄生虫丰度。丰度值是计数数据,也就是从0到某个最大值的整数,这取决于宿主 - 寄生虫系统。在统计分析中使用寄生虫丰度作为预测变量时,一种常见的方法是对数值进行分组,即根据丰度将宿主分为感染类别,并检验这些类别之间某些响应变量(例如宿主特征)的差异。这种方法存在一些有充分记录的缺陷。在这里,我通过文献综述表明,在过去15年发表在寄生虫学杂志上的研究中,有三分之一使用了对丰度值进行分组分析的方法,而在生态和行为学杂志上,这一比例为一半,而且往往没有任何理由。将丰度数据分为任意类别在使用实验感染的研究中比在使用自然感染宿主的研究中更为常见。然后我使用模拟数据证明,当对丰度值进行分组分析时,可能会错过寄生虫丰度与宿主特征之间真实且显著的关系,反之,当丰度与宿主特征之间不存在潜在关系时,对分组数据的分析可能会产生虚假关系。无论感染率或宿主样本中寄生虫聚集程度如何,都是如此。这些发现强烈主张放弃将丰度数据分组作为预测变量的做法,转而采用更合适的分析方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5366/12186089/7aa2c077359a/S003118202500040X_figAb1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验