Suppr超能文献

用于知识图谱的信息提取管道。

Information extraction pipelines for knowledge graphs.

作者信息

Jaradeh Mohamad Yaser, Singh Kuldeep, Stocker Markus, Both Andreas, Auer Sören

机构信息

L3S Research Center, Leibniz University Hannover, Hanover, Germany.

Zerotha-Research and Cerence GmbH, Aachen, Germany.

出版信息

Knowl Inf Syst. 2023;65(5):1989-2016. doi: 10.1007/s10115-022-01826-x. Epub 2023 Jan 7.

Abstract

In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community's disjoint efforts on KG completion. We include more components into the architecture of Plumber  to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.

摘要

在过去十年中,人们提出了大量知识图谱(KG)补全方法。尽管这些方法很有效,但它们是分散的,而且它们在有效KG补全方面的总体优缺点在文献中尚未得到研究。我们扩展了Plumber,这是一个将研究界在KG补全方面的分散努力整合在一起的框架。我们在Plumber的架构中纳入了更多组件,以包含40个可重复使用的组件,用于各种KG补全子任务,如指代消解、实体链接和关系抽取。利用这些组件,Plumber动态生成合适的知识提取管道,并提供总共432种不同的管道。我们研究了基于输入句子选择最优管道的优化问题。为此,我们训练了一个基于Transformer的分类模型,该模型从输入中提取上下文嵌入并找到合适的管道。我们使用标准数据集在三个知识图谱上研究了Plumber提取KG三元组的效果:DBpedia、维基数据和开放研究知识图谱。我们的结果证明了Plumber在动态生成KG补全管道方面的有效性,优于所有与底层KG无关的基线。此外,我们对集体失败案例进行了分析,研究了集成组件之间的异同和协同作用,并讨论了它们的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/898e/10076429/6306ad280517/10115_2022_1826_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验