Suppr超能文献

通过将PageRank与从文献中提取的关系相结合,将化学和疾病实体与本体进行关联。

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.

作者信息

Ruas Pedro, Lamurias Andre, Couto Francisco M

机构信息

LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisbon, Portugal.

出版信息

J Cheminform. 2020 Sep 21;12(1):57. doi: 10.1186/s13321-020-00461-4.

Abstract

BACKGROUND

Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse.

FINDINGS

This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches.

CONCLUSIONS

We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.

摘要

背景

命名实体链接系统是数字图书馆人工编目的有力辅助工具,由于信息过载,人工编目成本越来越高且效率低下。基于个性化PageRank(PPR)算法的模型是最先进的方法之一,但当消歧图稀疏时,这些模型的性能较低。

研究结果

这项工作提出了一种名为关系提取实体链接(REEL)的命名实体链接框架,该框架使用自动提取的关系来克服这一限制。我们的方法构建了一个消歧图,其中节点是实体的本体候选,边根据文本中自动提取的关系添加。然后应用PPR算法和每个本体的信息内容来为每个实体选择使消歧图连贯性最大化的候选。我们在三个黄金标准上评估了该方法:带有ChEBI注释的CRAFT语料库子集(CRAFT-ChEBI)、带有来自MEDIC词汇表疾病注释的BC5CDR语料库子集(BC5CDR-疾病)以及带有来自CTD-化学词汇表化学注释的子集(BC5CDR-化学物质)。在这些黄金标准中,REEL实现的F1分数分别为85.8%、80.9%和90.3%,优于基线方法。

结论

我们证明了关系提取工具可以通过捕获知识库中缺失的文本中表达的语义信息来改进命名实体链接,并利用它来改进命名实体链接模型的消歧图。只要有本体或其他知识库可用,REEL就可以适应任何文本挖掘管道,并且可能适用于任何领域。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验