Wageningen Environmental Research (WENR), Wageningen University & Research, Wageningen, Netherlands.
International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria.
PLoS One. 2023 Jul 13;18(7):e0287731. doi: 10.1371/journal.pone.0287731. eCollection 2023.
Reference data is key to produce reliable crop type and cropland maps. Although research projects, national and international programs as well as local initiatives constantly gather crop related reference data, finding, collecting, and harmonizing data from different sources is a challenging task. Furthermore, ethical, legal, and consent-related restrictions associated with data sharing represent a common dilemma faced by international research projects. We address these dilemmas by building a community-based, open, harmonised reference data repository at global extent, ready for model training or product validation. Our repository contains data from different sources such as the Group on Earth Observations Global Agricultural Monitoring Initiative (GEOGLAM) Joint Experiment for Crop Assessment and Monitoring (JECAM) sites, the Radiant MLHub, the Future Harvest (CGIAR) centers, the National Aeronautics and Space Administration Food Security and Agriculture Program (NASA Harvest), the International Institute for Applied Systems Analysis (IIASA) citizen science platforms (LACO-Wiki and Geo-Wiki), as well as from individual project contributions. Data of 2016 onwards were collected, harmonised, and annotated. The data sets spatial, temporal, and thematic quality were assessed applying rules developed in this research. Currently, the repository holds around 75 million harmonised observations with standardized metadata of which a large share is available to the public. The repository, funded by ESA through the WorldCereal project, can be used for either the calibration of image classification deep learning algorithms or the validation of Earth Observation generated products, such as global cropland extent and maize and wheat maps. We recommend continuing and institutionalizing this reference data initiative e.g. through GEOGLAM, and encouraging the community to publish land cover and crop type data following the open science and open data principles.
参考数据是生成可靠作物类型和耕地图的关键。尽管研究项目、国家和国际计划以及地方倡议不断收集与作物相关的参考数据,但从不同来源查找、收集和协调数据仍然是一项具有挑战性的任务。此外,与数据共享相关的伦理、法律和同意限制是国际研究项目面临的共同难题。我们通过构建一个基于社区的、开放的、全球范围的参考数据存储库来解决这些难题,该存储库可用于模型训练或产品验证。我们的存储库包含来自不同来源的数据,例如地球观测组织全球农业监测倡议(GEOGLAM)联合作物评估和监测实验(JECAM)站点、Radiant MLHub、未来收获(CGIAR)中心、美国国家航空航天局粮食安全和农业计划(NASA Harvest)、国际应用系统分析研究所(IIASA)公民科学平台(LACO-Wiki 和 Geo-Wiki),以及个别项目的贡献。我们收集、协调和注释了 2016 年以后的数据。应用本研究中开发的规则评估了数据集的空间、时间和主题质量。目前,该存储库拥有约 7500 万条经过协调的观测数据,并附有标准化元数据,其中大部分可供公众使用。该存储库由欧空局通过 WorldCereal 项目资助,可用于图像分类深度学习算法的校准,也可用于验证地球观测生成的产品,如全球耕地面积和玉米、小麦图。我们建议继续并将参考数据倡议制度化,例如通过 GEOGLAM,并鼓励社区按照开放科学和开放数据原则发布土地覆盖和作物类型数据。