Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, 475004, China.
School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China.
Sci Rep. 2022 May 5;12(1):7352. doi: 10.1038/s41598-022-10648-4.
The rapid popularization of high-speed mobile communication technology and the continuous development of mobile network devices have given spatial textual big data (STBD) new dimensions due to their ability to record geographical objects from multiple sources and with complex attributes. Data mining from spatial textual datasets has become a meaningful study. As a popular topic for STBD, the top-k spatial keyword query has been developed in various forms to deal with different retrievals requirements. However, previous research focused mainly on indexing locational attributes and retrievals of few target attributes, and these correlations between large numbers of the textual attributes have not been fully studied and demonstrated. To further explore interrelated-knowledge in the textual attributes, this paper defines the top-k frequent spatial keyword query (tfSKQ) and proposes a novel hybrid index structure, named RCL-tree, based on the concept lattice theory. We also develop the tfSKQ algorithms to retrieve the most frequent and nearest spatial objects in STBD. One existing method and two baseline algorithms are implemented, and a series of experiments are carried out using real datasets to evaluate its performance. Results demonstrated the effectiveness and efficiency of the proposed RCL-tree in tfSKQ with the complex spatial multi keyword query conditions.
高速移动通信技术的迅速普及和移动网络设备的不断发展,由于其能够从多个来源记录地理对象,并具有复杂的属性,为空间文本大数据(STBD)赋予了新的维度。对空间文本数据集的数据挖掘已经成为一项有意义的研究。作为 STBD 的热门话题,Top-k 空间关键字查询已经以各种形式发展起来,以满足不同的检索需求。然而,以前的研究主要集中在索引位置属性和少数目标属性的检索上,而这些大量文本属性之间的相关性尚未得到充分研究和证明。为了进一步探索文本属性中的相关知识,本文定义了 Top-k 频繁空间关键字查询(tfSKQ),并基于概念格理论提出了一种新的混合索引结构,称为 RCL-树。我们还开发了 tfSKQ 算法来检索 STBD 中最频繁和最近的空间对象。实现了一种现有方法和两种基线算法,并使用真实数据集进行了一系列实验来评估其性能。结果表明,在具有复杂空间多关键字查询条件下,所提出的 RCL-树在 tfSKQ 中是有效和高效的。