Yao Yao, Ma Yueheng, Gao Ronghui, Yan Xiaoqin, Guan Qingfeng
UrbanComp Lab, School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430078, Hubei province, China.
National Engineering Research Center of Geographic Information System, China University of Geosciences, Wuhan, 430078, Hubei province, China.
Sci Data. 2025 May 7;12(1):760. doi: 10.1038/s41597-025-05047-z.
High-quality land use datasets are essential for advancing research in land use classification and recognition. However, the complexity and spatial heterogeneity of land use create challenges in dataset construction. To address these issues, we present MSLU-100K, a multi-source land use dataset encompassing over 100,000 irregular parcel samples from 81 Chinese cities. Constructed using a human-computer collaboration framework, this dataset integrates remote sensing and POI (Point of Interest) data, categorizing parcels into 7 primary and 28 secondary land use types. A novel multi-level classification approach combines manual labeling and deep learning, ensuring high data quality across six quality levels. Over 57% of the dataset comprises high-quality samples (Levels 4 and 5), which significantly enhance classification performance. The dataset provides a robust resource for land use recognition, urban planning, and spatial research.
高质量的土地利用数据集对于推进土地利用分类和识别研究至关重要。然而,土地利用的复杂性和空间异质性给数据集构建带来了挑战。为了解决这些问题,我们提出了MSLU-100K,这是一个多源土地利用数据集,包含来自中国81个城市的超过10万个不规则地块样本。该数据集采用人机协作框架构建,整合了遥感和兴趣点(POI)数据,将地块分为7种主要和28种次要土地利用类型。一种新颖的多层次分类方法结合了人工标注和深度学习,确保了六个质量等级的高数据质量。该数据集超过57%的样本为高质量样本(4级和5级),这显著提高了分类性能。该数据集为土地利用识别、城市规划和空间研究提供了强大资源。