German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, 04103, Leipzig, Germany.
Ecology and Evolution Research Centre, School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, NSW, Australia.
Sci Rep. 2021 Sep 24;11(1):19073. doi: 10.1038/s41598-021-98584-7.
Citizen science platforms are quickly accumulating hundreds of millions of biodiversity observations around the world annually. Quantifying and correcting for the biases in citizen science datasets remains an important first step before these data are used to address ecological questions and monitor biodiversity. One source of potential bias among datasets is the difference between those citizen science programs that have unstructured protocols and those that have semi-structured or structured protocols for submitting observations. To quantify biases in an unstructured citizen science platform, we contrasted bird observations from the unstructured iNaturalist platform with that from a semi-structured citizen science platform-eBird-for the continental United States. We tested whether four traits of species (body size, commonness, flock size, and color) predicted if a species was under- or over-represented in the unstructured dataset compared with the semi-structured dataset. We found strong evidence that large-bodied birds were over-represented in the unstructured citizen science dataset; moderate evidence that common species were over-represented in the unstructured dataset; strong evidence that species in large groups were over-represented; and no evidence that colorful species were over-represented in unstructured citizen science data. Our results suggest that biases exist in unstructured citizen science data when compared with semi-structured data, likely as a result of the detectability of a species and the inherent recording process. Importantly, in programs like iNaturalist the detectability process is two-fold-first, an individual organism needs to be detected, and second, it needs to be photographed, which is likely easier for many large-bodied species. Our results indicate that caution is warranted when using unstructured citizen science data in ecological modelling, and highlight body size as a fundamental trait that can be used as a covariate for modelling opportunistic species occurrence records, representing the detectability or identifiability in unstructured citizen science datasets. Future research in this space should continue to focus on quantifying and documenting biases in citizen science data, and expand our research by including structured citizen science data to understand how biases differ among unstructured, semi-structured, and structured citizen science platforms.
公民科学平台每年在全球范围内迅速积累数亿次生物多样性观测。在将这些数据用于解决生态问题和监测生物多样性之前,量化和纠正公民科学数据集中的偏差仍然是重要的第一步。数据集之间存在潜在偏差的一个来源是那些具有非结构化协议的公民科学计划和那些具有半结构化或结构化协议的公民科学计划之间的差异,用于提交观测结果。为了量化非结构化公民科学平台中的偏差,我们对比了美国大陆的非结构化 iNaturalist 平台和半结构化公民科学平台 eBird 的鸟类观测数据。我们测试了物种的四个特征(体型、常见性、群体大小和颜色)是否可以预测一个物种在非结构化数据集中的代表性过高或过低,与半结构化数据集相比。我们有强有力的证据表明,体型较大的鸟类在非结构化公民科学数据集中过度代表;中等证据表明,常见物种在非结构化数据集过度代表;强有力的证据表明,大群体的物种过度代表;没有证据表明色彩鲜艳的物种在非结构化公民科学数据中过度代表。我们的结果表明,与半结构化数据相比,非结构化公民科学数据中存在偏差,这可能是由于物种的可检测性和固有记录过程造成的。重要的是,在像 iNaturalist 这样的项目中,检测过程是双重的——首先,需要检测到单个生物体,其次,需要对其进行拍照,这对许多体型较大的物种来说可能更容易。我们的结果表明,在生态建模中使用非结构化公民科学数据时需要谨慎,并强调体型作为一个基本特征,可以作为模型机会物种出现记录的协变量,代表非结构化公民科学数据集中的可检测性或可识别性。未来在这一领域的研究应继续侧重于量化和记录公民科学数据中的偏差,并通过包括结构化公民科学数据来扩大我们的研究范围,以了解非结构化、半结构化和结构化公民科学平台之间的偏差差异。