Suppr超能文献

评估生物多样性监测中数据集检索自动化的可行性。

Evaluating the feasibility of automating dataset retrieval for biodiversity monitoring.

作者信息

Fuster-Calvo Alexandre, Valentin Sarah, Tamayo William C, Gravel Dominique

机构信息

Biology Department, University of Sherbrooke, Sherbrooke, Quebec, Canada.

Joint Research Unit Land, Remote Sensing and Spatial Information (UMR TETIS), French Agricultural Research Centre for International Development (CIRAD), Montpellier, France.

出版信息

PeerJ. 2025 Jan 29;13:e18853. doi: 10.7717/peerj.18853. eCollection 2025.

Abstract

AIM

Effective management strategies for conserving biodiversity and mitigating the impacts of global change rely on access to comprehensive and up-to-date biodiversity data. However, manual search, retrieval, evaluation, and integration of this information into databases present a significant challenge to keeping pace with the rapid influx of large amounts of data, hindering its utility in contemporary decision-making processes. Automating these tasks through advanced algorithms holds immense potential to revolutionize biodiversity monitoring.

INNOVATION

In this study, we investigate the potential for automating the retrieval and evaluation of biodiversity data from Dryad and Zenodo repositories. We have designed an evaluation system based on various criteria, including the type of data provided and its spatio-temporal range, and applied it to manually assess the relevance for biodiversity monitoring of datasets retrieved through an application programming interface (API). We evaluated a supervised classification to identify potentially relevant datasets and investigate the feasibility of automatically ranking the relevance. Additionally, we applied the same appraoch on a scientific literature source, using data from Semantic Scholar for reference. Our evaluation centers on the database utilized by a national biodiversity monitoring system in Quebec, Canada.

MAIN CONCLUSIONS

We retrieved 89 (55%) relevant datasets for our database, showing the value of automated dataset search in repositories. Additionally, we find that scientific publication sources offer broader temporal coverage and can serve as conduits guiding researchers toward other valuable data sources. Our automated classification system showed moderate performance in detecting relevant datasets (with an F-score up to 0.68) and signs of overfitting, emphasizing the need for further refinement. A key challenge identified in our manual evaluation is the scarcity and uneven distribution of metadata in the texts, especially pertaining to spatial and temporal extents. Our evaluative framework, based on predefined criteria, can be adopted by automated algorithms for streamlined prioritization, and we make our manually evaluated data publicly available, serving as a benchmark for improving classification techniques.

摘要

目的

保护生物多样性和减轻全球变化影响的有效管理策略依赖于获取全面且最新的生物多样性数据。然而,人工搜索、检索、评估这些信息并将其整合到数据库中,对于跟上大量数据的快速涌入而言是一项重大挑战,阻碍了其在当代决策过程中的效用。通过先进算法自动化这些任务,对于革新生物多样性监测具有巨大潜力。

创新

在本研究中,我们调查了从Dryad和Zenodo存储库自动检索和评估生物多样性数据的潜力。我们基于各种标准设计了一个评估系统,包括所提供数据的类型及其时空范围,并将其应用于手动评估通过应用程序编程接口(API)检索到的数据集与生物多样性监测的相关性。我们评估了一种监督分类,以识别潜在相关的数据集,并研究自动对相关性进行排名的可行性。此外,我们对一个科学文献源应用了相同的方法,使用来自语义学者的数据作为参考。我们的评估以加拿大魁北克省一个国家生物多样性监测系统使用的数据库为中心。

主要结论

我们为我们的数据库检索到了89个(55%)相关数据集,显示了在存储库中自动进行数据集搜索的价值。此外,我们发现科学出版物来源提供了更广泛的时间覆盖范围,并且可以作为引导研究人员获取其他有价值数据源的渠道。我们的自动分类系统在检测相关数据集方面表现中等(F值高达0.68)且有过拟合迹象,这强调了进一步优化的必要性。我们在手动评估中确定的一个关键挑战是文本中元数据的稀缺和分布不均,特别是关于空间和时间范围的元数据。我们基于预定义标准的评估框架可被自动化算法采用,以实现简化的优先级排序,并且我们将手动评估的数据公开,作为改进分类技术的基准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e28a/11786708/0f349d789673/peerj-13-18853-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验