Department of Zoology and Ecology, University of Navarra, Pamplona, Navarra, Spain.
PLoS One. 2013;8(1):e55144. doi: 10.1371/journal.pone.0055144. Epub 2013 Jan 25.
In order to effectively understand and cope with the current 'biodiversity crisis', having large-enough sets of qualified data is necessary. Information facilitators such as the Global Biodiversity Information Facility (GBIF) are ensuring increasing availability of primary biodiversity records by linking data collections spread over several institutions that have agreed to publish their data in a common access schema. We have assessed the primary records that one such publisher, the Spanish node of GBIF (GBIF.ES), hosts on behalf of a number of institutions, considered to be a highly representative sample of the total mass of available data for a country in order to know the quantity and quality of the information made available. Our results may provide an indication of the overall fitness-for-use in these data. We have found a number of patterns in the availability and accrual of data that seem to arise naturally from the digitization processes. Knowing these patterns and features may help deciding when and how these data can be used. Broadly, the error level seems low. The available data may be of capital importance for the development of biodiversity research, both locally and globally. However, wide swaths of records lack data elements such as georeferencing or taxonomical levels. Although the remaining information is ample and fit for many uses, improving the completeness of the records would likely increase the usability span for these data.
为了有效理解和应对当前的“生物多样性危机”,拥有足够数量的合格数据是必要的。全球生物多样性信息设施(GBIF)等信息促进者通过将分布在多个机构的数据集合联系起来,确保越来越多的原始生物多样性记录可用,这些机构已经同意按照通用访问模式发布其数据。我们评估了一个这样的出版商,即 GBIF 的西班牙节点(GBIF.ES),代表一些机构托管的原始记录,这些记录被认为是一个国家总可用数据的高度代表性样本,以了解所提供信息的数量和质量。我们的结果可能表明这些数据的总体适用性。我们在数据的可用性和积累方面发现了一些模式,这些模式似乎是数字化过程的自然结果。了解这些模式和特征可能有助于确定何时以及如何使用这些数据。总的来说,错误水平似乎很低。这些可用数据对于本地和全球的生物多样性研究的发展可能具有重要意义。然而,大量的记录缺乏地理参考或分类级别等数据元素。尽管剩余的信息非常充足,适用于许多用途,但提高记录的完整性可能会增加这些数据的可用性范围。