Contractor, U.S. Geological Survey Pacific Coastal and Marine Science Center, Santa Cruz, CA, USA.
U.S. Geological Survey Pacific Coastal and Marine Science Center, Santa Cruz, CA, USA.
Sci Data. 2023 Jan 20;10(1):46. doi: 10.1038/s41597-023-01929-2.
The world's coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessitates frequent observation from imaging sensors on remote sensing platforms. Machine Learning models that carry out supervised (i.e., human-guided) pixel-based classification, or image segmentation, have transformative applications in spatio-temporal mapping of dynamic environments, including transient coastal landforms, sediments, habitats, waterbodies, and water flows. However, these models require large and well-documented training and testing datasets consisting of labeled imagery. We describe "Coast Train," a multi-labeler dataset of orthomosaic and satellite images of coastal environments and corresponding labels. These data include imagery that are diverse in space and time, and contain 1.2 billion labeled pixels, representing over 3.6 million hectares. We use a human-in-the-loop tool especially designed for rapid and reproducible Earth surface image segmentation. Our approach permits image labeling by multiple labelers, in turn enabling quantification of pixel-level agreement over individual and collections of images.
世界海岸线是空间高度可变的、耦合人-自然的系统,由嵌套的地貌成分、生态系统和人类干预组成,每个成分在一定的时空范围内相互作用。理解和预测海岸线动态需要从遥感平台上的成像传感器进行频繁观测。执行监督(即人类指导)基于像素的分类或图像分割的机器学习模型,在包括瞬态海岸地貌、沉积物、栖息地、水体和水流在内的动态环境的时空映射中具有变革性的应用。然而,这些模型需要包含大量标记图像的大型且有充分记录的训练和测试数据集。我们描述了“海岸训练”,这是一个多标签的沿海环境正射影像和卫星图像以及相应标签的数据集。这些数据包括在空间和时间上多样化的图像,包含 12 亿个标记像素,代表超过 360 万公顷的面积。我们使用一种专门设计的、用于快速和可重复的地球表面图像分割的人机交互工具。我们的方法允许多个标注者对图像进行标注,从而能够量化单个和多个图像上的像素级一致性。