Yan YuHan, Wu PengDa, Yin Yong, Guo PeiPei
Department of Geographic Information System, Chinese Academy of Surveying and mapping, Beijing, 100036, China.
School of Information Engineering, China University of Geosciences, Beijing, 100083, China.
Sci Rep. 2024 Dec 30;14(1):31616. doi: 10.1038/s41598-024-79812-2.
Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching. To resolve these problems, robust multi-source geographic entities matching by maximizing geometric and semantic similarity is proposed. First, the entire entity is segmented based on shape features, and the partitioned feature segments are extracted as matching primitives; Second, feature segments are grouped into patterns, encompassing three major categories and fourteen subcategories; Following this, pattern matching is performed based on spatial similarity metric such as maximum projection distance, etc.; Finally, the spatial matches are detected and refined through semantic similarity calculation. The proposed method is tested using two datasets from regions in southeast and northwest China. The experimental results demonstrate that our method can be effectively applied to both area and line entity matching with strong generalization and application capability and significantly improved matching accuracy. Specifically, nine feature segment matching patterns for matching area entities and six for line entities are utilized, and the precision and recall are nearly 90%.
地理实体匹配是多源空间数据融合、信息关联与共享的重要手段。现有研究针对不同类型的实体数据特征(如线和区域)设计了相应的匹配方法。然而,这些方法在匹配多源异构数据的泛化能力以及复杂模式匹配的准确性方面往往存在局限。为解决这些问题,提出了通过最大化几何和语义相似度进行稳健的多源地理实体匹配方法。首先,基于形状特征对整个实体进行分割,并提取分割后的特征片段作为匹配基元;其次,将特征片段分组为模式,包括三大类和十四个子类;接着,基于最大投影距离等空间相似性度量进行模式匹配;最后,通过语义相似度计算检测并优化空间匹配。使用来自中国东南部和西北部地区的两个数据集对所提方法进行了测试。实验结果表明,我们的方法能够有效应用于区域和线状实体匹配,具有很强的泛化和应用能力,且匹配精度显著提高。具体而言,利用了九个用于区域实体匹配的特征片段匹配模式和六个用于线状实体匹配的模式,精确率和召回率接近90%。