Suppr超能文献

利用无知地图展示从生物多样性数据库获取的数据在采样工作中的偏差。

Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps.

作者信息

Ruete Alejandro

机构信息

Swedish University of Agricultural Sciences, Uppsala, Sweden.

出版信息

Biodivers Data J. 2015 Jul 28(3):e5361. doi: 10.3897/BDJ.3.e5361. eCollection 2015.

Abstract

BACKGROUND

Open-access biodiversity databases including mainly citizen science data make temporally and spatially extensive species' observation data available to a wide range of users. Such data have limitations however, which include: sampling bias in favour of recorder distribution, lack of survey effort assessment, and lack of coverage of the distribution of all organisms. These limitations are not always recorded, while any technical assessment or scientific research based on such data should include an evaluation of the uncertainty of its source data and researchers should acknowledge this information in their analysis. The here proposed maps of ignorance are a critical and easy way to implement a tool to not only visually explore the quality of the data, but also to filter out unreliable results.

NEW INFORMATION

I present simple algorithms to display ignorance maps as a tool to report the spatial distribution of the bias and lack of sampling effort across a study region. Ignorance scores are expressed solely based on raw data in order to rely on the fewest assumptions possible. Therefore there is no prediction or estimation involved. The rationale is based on the assumption that it is appropriate to use species groups as a surrogate for sampling effort because it is likely that an entire group of species observed by similar methods will share similar bias. Simple algorithms are then used to transform raw data into ignorance scores scaled 0-1 that are easily comparable and scalable. Because of the need to perform calculations over big datasets, simplicity is crucial for web-based implementations on infrastructures for biodiversity information. With these algorithms, any infrastructure for biodiversity information can offer a quality report of the observations accessed through them. Users can specify a reference taxonomic group and a time frame according to the research question. The potential of this tool lies in the simplicity of its algorithms and in the lack of assumptions made about the bias distribution, giving the user the freedom to tailor analyses to their specific needs.

摘要

背景

包括主要公民科学数据在内的开放获取生物多样性数据库,使广泛的用户能够获取时间和空间上广泛的物种观测数据。然而,这些数据存在局限性,包括:有利于记录者分布的抽样偏差、缺乏调查力度评估以及缺乏对所有生物分布的覆盖。这些局限性并不总是被记录下来,而基于此类数据的任何技术评估或科学研究都应包括对其源数据不确定性的评估,研究人员应在分析中承认这些信息。这里提出的未知地图是一种关键且简便的方法,用于实现一种工具,不仅可以直观地探索数据质量,还能筛选出不可靠的结果。

新信息

我提出了简单的算法来显示未知地图,作为一种工具来报告研究区域内偏差和抽样力度不足的空间分布。未知分数仅基于原始数据来表示,以便尽可能少地依赖假设。因此,不涉及预测或估计。其基本原理基于这样的假设,即使用物种组作为抽样力度的替代是合适的,因为通过类似方法观察到的整个物种组可能会有相似的偏差。然后使用简单的算法将原始数据转换为0 - 1范围内的未知分数,这些分数易于比较和扩展。由于需要对大型数据集进行计算,简单性对于基于网络的生物多样性信息基础设施实现至关重要。通过这些算法,任何生物多样性信息基础设施都可以提供通过它们访问的观测数据的质量报告。用户可以根据研究问题指定一个参考分类群和一个时间框架。这个工具的潜力在于其算法的简单性以及对偏差分布不做假设,给予用户根据其特定需求定制分析的自由。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa39/4549634/28fecbc18825/biodiversity_data_journal-3-e5361-g001_a.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验