Suppr超能文献

gRDF:一种利用gRePair减少结构规律性的高效压缩器。

gRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePair.

作者信息

Sultana Tangina, Lee Young-Koo

机构信息

Department of Computer Science and Engineering, Kyung Hee University, Global Campus, Yongin-si 17104, Korea.

出版信息

Sensors (Basel). 2022 Mar 26;22(7):2545. doi: 10.3390/s22072545.

Abstract

The explosive volume of semantic data published in the Resource Description Framework (RDF) data model demands efficient management and compression with better compression ratio and runtime. Although extensive work has been carried out for compressing the RDF datasets, they do not perform well in all dimensions. However, these compressors rarely exploit the graph patterns and structural regularities of real-world datasets. Moreover, there are a variety of existing approaches that reduce the size of a graph by using a grammar-based graph compression algorithm. In this study, we introduce a novel approach named gRDF (graph repair for RDF) that uses gRePair, one of the most efficient grammar-based graph compression schemes, to compress the RDF dataset. In addition to that, we have improved the performance of HDT (header-dictionary-triple), an efficient approach for compressing the RDF datasets based on structural properties, by introducing modified HDT (M-HDT). It can detect the frequent graph pattern by employing the data-structure-oriented approach in a single pass from the dataset. In our proposed system, we use M-HDT for indexing the nodes and edge labels. Then, we employ gRePair algorithm for identifying the grammar from the RDF graph. Afterward, the system improves the performance of k2-trees by introducing a more efficient algorithm to create the trees and serialize the RDF datasets. Our experiments affirm that the proposed gRDF scheme can substantially achieve at approximately 26.12%, 13.68%, 6.81%, 2.38%, and 12.76% better compression ratio when compared with the most prominent state-of-the-art schemes such as HDT, HDT++, k2-trees, RDF-TR, and gRePair in the case of real-world datasets. Moreover, the processing efficiency of our proposed scheme also outperforms others.

摘要

以资源描述框架(RDF)数据模型发布的语义数据量呈爆发式增长,这就需要高效的管理和压缩,以实现更好的压缩率和运行时性能。尽管已经开展了大量工作来压缩RDF数据集,但它们在各个方面的表现并不理想。然而,这些压缩器很少利用现实世界数据集的图模式和结构规律。此外,现有的多种方法通过基于语法的图压缩算法来减小图的大小。在本研究中,我们引入了一种名为gRDF(RDF的图修复)的新颖方法,该方法使用gRePair(最有效的基于语法的图压缩方案之一)来压缩RDF数据集。除此之外,我们还通过引入改进的HDT(M - HDT)来提高HDT(头字典三元组)的性能,HDT是一种基于结构属性压缩RDF数据集的有效方法。它可以通过在数据集中单次遍历采用面向数据结构的方法来检测频繁的图模式。在我们提出的系统中,我们使用M - HDT来索引节点和边标签。然后,我们采用gRePair算法从RDF图中识别语法。之后,该系统通过引入一种更高效的算法来创建树并序列化RDF数据集,从而提高k2 - 树的性能。我们的实验证实,在处理真实世界数据集时,与最著名且最先进的方案(如HDT、HDT++、k2 - 树、RDF - TR和gRePair)相比,所提出的gRDF方案的压缩率分别显著提高了约26.12%、13.68%、6.81%、2.38%和12.76%。此外,我们提出的方案的处理效率也优于其他方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b0/9003471/d637e2efd75e/sensors-22-02545-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验