通过将PageRank与从文献中提取的关系相结合，将化学和疾病实体与本体进行关联。

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.

作者信息

Ruas Pedro, Lamurias Andre, Couto Francisco M

机构信息

LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisbon, Portugal.

出版信息

J Cheminform. 2020 Sep 21;12(1):57. doi: 10.1186/s13321-020-00461-4.

DOI:10.1186/s13321-020-00461-4

PMID:33430995

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7507273/

Abstract

BACKGROUND

Named Entity Linking systems are a powerful aid to the manual curation of digital libraries, which is getting increasingly costly and inefficient due to the information overload. Models based on the Personalized PageRank (PPR) algorithm are one of the state-of-the-art approaches, but these have low performance when the disambiguation graphs are sparse.

FINDINGS

This work proposes a Named Entity Linking framework designated by Relation Extraction for Entity Linking (REEL) that uses automatically extracted relations to overcome this limitation. Our method builds a disambiguation graph, where the nodes are the ontology candidates for the entities and the edges are added according to the relations established in the text, which the method extracts automatically. The PPR algorithm and the information content of each ontology are then applied to choose the candidate for each entity that maximises the coherence of the disambiguation graph. We evaluated the method on three gold standards: the subset of the CRAFT corpus with ChEBI annotations (CRAFT-ChEBI), the subset of the BC5CDR corpus with disease annotations from the MEDIC vocabulary (BC5CDR-Diseases) and the subset with chemical annotations from the CTD-Chemical vocabulary (BC5CDR-Chemicals). The F1-Score achieved by REEL was 85.8%, 80.9% and 90.3% in these gold standards, respectively, outperforming baseline approaches.

CONCLUSIONS

We demonstrated that RE tools can improve Named Entity Linking by capturing semantic information expressed in text missing in Knowledge Bases and use it to improve the disambiguation graph of Named Entity Linking models. REEL can be adapted to any text mining pipeline and potentially to any domain, as long as there is an ontology or other knowledge Base available.

摘要

背景

命名实体链接系统是数字图书馆人工编目的有力辅助工具，由于信息过载，人工编目成本越来越高且效率低下。基于个性化PageRank（PPR）算法的模型是最先进的方法之一，但当消歧图稀疏时，这些模型的性能较低。

研究结果

这项工作提出了一种名为关系提取实体链接（REEL）的命名实体链接框架，该框架使用自动提取的关系来克服这一限制。我们的方法构建了一个消歧图，其中节点是实体的本体候选，边根据文本中自动提取的关系添加。然后应用PPR算法和每个本体的信息内容来为每个实体选择使消歧图连贯性最大化的候选。我们在三个黄金标准上评估了该方法：带有ChEBI注释的CRAFT语料库子集（CRAFT-ChEBI）、带有来自MEDIC词汇表疾病注释的BC5CDR语料库子集（BC5CDR-疾病）以及带有来自CTD-化学词汇表化学注释的子集（BC5CDR-化学物质）。在这些黄金标准中，REEL实现的F1分数分别为85.8%、80.9%和90.3%，优于基线方法。

结论

我们证明了关系提取工具可以通过捕获知识库中缺失的文本中表达的语义信息来改进命名实体链接，并利用它来改进命名实体链接模型的消歧图。只要有本体或其他知识库可用，REEL就可以适应任何文本挖掘管道，并且可能适用于任何领域。

相似文献

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.通过将PageRank与从文献中提取的关系相结合，将化学和疾病实体与本体进行关联。

J Cheminform. 2020 Sep 21;12(1):57. doi: 10.1186/s13321-020-00461-4.

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.PPR-SSM：用于实体链接的个性化 PageRank 和语义相似性度量。

BMC Bioinformatics. 2019 Oct 29;20(1):534. doi: 10.1186/s12859-019-3157-y.

NILINKER: Attention-based approach to NIL Entity Linking.NILINKER：基于注意力机制的零实体链接方法。

J Biomed Inform. 2022 Aug;132:104137. doi: 10.1016/j.jbi.2022.104137. Epub 2022 Jul 8.

BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies.BO-LSTM：通过生物医学本体论沿长短时记忆网络进行关系分类。

BMC Bioinformatics. 2019 Jan 7;20(1):10. doi: 10.1186/s12859-018-2584-5.

Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。

Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.

Extraction of semantic biomedical relations from text using conditional random fields.使用条件随机场从文本中提取语义生物医学关系。

BMC Bioinformatics. 2008 Apr 23;9:207. doi: 10.1186/1471-2105-9-207.

Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。

BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.

Chemical entity normalization for successful translational development of Alzheimer's disease and dementia therapeutics.化学实体标准化对阿尔茨海默病和痴呆症治疗药物的成功转化开发至关重要。

J Biomed Semantics. 2024 Jul 31;15(1):13. doi: 10.1186/s13326-024-00314-1.

Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.

OGER++: hybrid multi-type entity recognition.OGER++：混合多类型实体识别

J Cheminform. 2019 Jan 21;11(1):7. doi: 10.1186/s13321-018-0326-3.

引用本文的文献

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools.HunFlair2 在生物医学命名实体识别和标准化工具的跨语料库评估中的应用。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae564.

J Biomed Semantics. 2024 Jul 31;15(1):13. doi: 10.1186/s13326-024-00314-1.

An overview of biomedical entity linking throughout the years.生物医学实体链接概述。

J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.

Hybrid semantic recommender system for chemical compounds in large-scale datasets.大规模数据集中化合物的混合语义推荐系统。

J Cheminform. 2021 Feb 23;13(1):15. doi: 10.1186/s13321-021-00495-2.

本文引用的文献

BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.PPR-SSM：用于实体链接的个性化 PageRank 和语义相似性度量。

BMC Bioinformatics. 2019 Oct 29;20(1):534. doi: 10.1186/s12859-019-3157-y.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies.BO-LSTM：通过生物医学本体论沿长短时记忆网络进行关系分类。

BMC Bioinformatics. 2019 Jan 7;20(1):10. doi: 10.1186/s12859-018-2584-5.

The Comparative Toxicogenomics Database: update 2019.比较毒理学基因组学数据库：2019 年更新。

Nucleic Acids Res. 2019 Jan 8;47(D1):D948-D954. doi: 10.1093/nar/gky868.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文（CRAFT）语料库中的共指标注与消解

BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.TaggerOne：使用半马尔可夫模型进行联合命名实体识别与归一化

Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.

BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库：化学疾病关系提取的资源。

Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.

ChEBI in 2016: Improved services and an expanding collection of metabolites.2016年的ChEBI：服务改进与代谢物集合的扩充

Nucleic Acids Res. 2016 Jan 4;44(D1):D1214-9. doi: 10.1093/nar/gkv1031. Epub 2015 Oct 13.

Entity linking for biomedical literature.生物医学文献的实体链接

BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1472-6947-15-S1-S4. Epub 2015 May 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过将PageRank与从文献中提取的关系相结合，将化学和疾病实体与本体进行关联。

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.

作者信息

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSIONS

背景

研究结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献