利用图核进行高性能生物医学关系提取。

Exploiting graph kernels for high performance biomedical relation extraction.

作者信息

Panyam Nagesh C, Verspoor Karin, Cohn Trevor, Ramamohanarao Kotagiri

机构信息

School of Computing and Information Systems, University of Melbourne, Melbourne, Australia.

出版信息

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

DOI:10.1186/s13326-017-0168-3

PMID:29382397

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5791373/

Abstract

BACKGROUND

Relation extraction from biomedical publications is an important task in the area of semantic mining of text. Kernel methods for supervised relation extraction are often preferred over manual feature engineering methods, when classifying highly ordered structures such as trees and graphs obtained from syntactic parsing of a sentence. Tree kernels such as the Subset Tree Kernel and Partial Tree Kernel have been shown to be effective for classifying constituency parse trees and basic dependency parse graphs of a sentence. Graph kernels such as the All Path Graph kernel (APG) and Approximate Subgraph Matching (ASM) kernel have been shown to be suitable for classifying general graphs with cycles, such as the enhanced dependency parse graph of a sentence. In this work, we present a high performance Chemical-Induced Disease (CID) relation extraction system. We present a comparative study of kernel methods for the CID task and also extend our study to the Protein-Protein Interaction (PPI) extraction task, an important biomedical relation extraction task. We discuss novel modifications to the ASM kernel to boost its performance and a method to apply graph kernels for extracting relations expressed in multiple sentences.

RESULTS

Our system for CID relation extraction attains an F-score of 60%, without using external knowledge sources or task specific heuristic or rules. In comparison, the state of the art Chemical-Disease Relation Extraction system achieves an F-score of 56% using an ensemble of multiple machine learning methods, which is then boosted to 61% with a rule based system employing task specific post processing rules. For the CID task, graph kernels outperform tree kernels substantially, and the best performance is obtained with APG kernel that attains an F-score of 60%, followed by the ASM kernel at 57%. The performance difference between the ASM and APG kernels for CID sentence level relation extraction is not significant. In our evaluation of ASM for the PPI task, ASM performed better than APG kernel for the BioInfer dataset, in the Area Under Curve (AUC) measure (74% vs 69%). However, for all the other PPI datasets, namely AIMed, HPRD50, IEPA and LLL, ASM is substantially outperformed by the APG kernel in F-score and AUC measures.

CONCLUSIONS

We demonstrate a high performance Chemical Induced Disease relation extraction, without employing external knowledge sources or task specific heuristics. Our work shows that graph kernels are effective in extracting relations that are expressed in multiple sentences. We also show that the graph kernels, namely the ASM and APG kernels, substantially outperform the tree kernels. Among the graph kernels, we showed the ASM kernel as effective for biomedical relation extraction, with comparable performance to the APG kernel for datasets such as the CID-sentence level relation extraction and BioInfer in PPI. Overall, the APG kernel is shown to be significantly more accurate than the ASM kernel, achieving better performance on most datasets.

摘要

背景

从生物医学出版物中提取关系是文本语义挖掘领域的一项重要任务。在对从句子句法分析中获得的诸如树和图等高阶结构进行分类时，用于监督关系提取的核方法通常比手动特征工程方法更受青睐。诸如子集树核和部分树核等树核已被证明对句子的成分分析树和基本依存关系分析图的分类有效。诸如全路径图核（APG）和近似子图匹配（ASM）核等图核已被证明适用于对具有循环的一般图进行分类，例如句子的增强依存关系分析图。在这项工作中，我们提出了一个高性能的化学诱导疾病（CID）关系提取系统。我们对CID任务的核方法进行了比较研究，并将我们的研究扩展到蛋白质 - 蛋白质相互作用（PPI）提取任务，这是一项重要的生物医学关系提取任务。我们讨论了对ASM核的新颖修改以提高其性能，以及一种应用图核来提取多句中表达的关系的方法。

结果

我们的CID关系提取系统在不使用外部知识源或特定任务启发式方法或规则的情况下，F值达到了60%。相比之下，当前最先进的化学 - 疾病关系提取系统使用多种机器学习方法的集成获得了56%的F值，然后通过采用特定任务后处理规则的基于规则的系统将其提高到61%。对于CID任务，图核在很大程度上优于树核，使用APG核获得了最佳性能，F值达到60%，其次是ASM核，为57%。ASM和APG核在CID句子级关系提取中的性能差异不显著。在我们对PPI任务的ASM评估中，在曲线下面积（AUC）度量方面，ASM在BioInfer数据集上的表现优于APG核（74%对69%）。然而，对于所有其他PPI数据集，即AIMed、HPRD50、IEPA和LLL，在F值和AUC度量方面，ASM明显不如APG核。

结论

我们展示了一种高性能的化学诱导疾病关系提取方法，无需使用外部知识源或特定任务启发式方法。我们的工作表明图核在提取多句中表达的关系方面是有效的。我们还表明，图核，即ASM和APG核，在很大程度上优于树核。在图核中，我们表明ASM核对于生物医学关系提取是有效的，在诸如CID句子级关系提取和PPI中的BioInfer等数据集上与APG核具有可比的性能。总体而言，APG核在大多数数据集上表现出比ASM核显著更准确的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9a5/5791373/f430d22c60e7/13326_2017_168_Fig1_HTML.jpg

相似文献

Exploiting graph kernels for high performance biomedical relation extraction.

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

PLoS One. 2017 Nov 3;12(11):e0187379. doi: 10.1371/journal.pone.0187379. eCollection 2017.

Hash subgraph pairwise kernel for protein-protein interaction extraction.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1190-202. doi: 10.1109/TCBB.2012.50.

Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):67. doi: 10.1186/s12918-016-0311-2.

Ranking support vector machine for multiple kernels output combination in protein-protein interaction extraction from biomedical literature.

Proteomics. 2011 Oct;11(19):3811-7. doi: 10.1002/pmic.201100188. Epub 2011 Sep 2.

Multiple kernels learning-based biological entity relationship extraction method.

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):38. doi: 10.1186/s13326-017-0138-9.

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-9-S11-S2.

Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network.

J Biomed Inform. 2023 Aug;144:104445. doi: 10.1016/j.jbi.2023.104445. Epub 2023 Jul 17.

Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation.

JMIR Med Inform. 2020 Jul 31;8(7):e17638. doi: 10.2196/17638.

引用本文的文献

Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.

Eur J Med Res. 2024 Aug 2;29(1):404. doi: 10.1186/s40001-024-01983-5.

Large language model based framework for automated extraction of genetic interactions from unstructured data.

PLoS One. 2024 May 21;19(5):e0303231. doi: 10.1371/journal.pone.0303231. eCollection 2024.

Exploiting document graphs for inter sentence relation extraction.

J Biomed Semantics. 2022 Jun 3;13(1):15. doi: 10.1186/s13326-022-00267-3.

A cancer graph: a lung cancer property graph database in Neo4j.

BMC Res Notes. 2022 Feb 14;15(1):45. doi: 10.1186/s13104-022-05912-9.

Biomedical relation extraction via knowledge-enhanced reading comprehension.

BMC Bioinformatics. 2022 Jan 6;23(1):20. doi: 10.1186/s12859-021-04534-5.

COVID-19 Surveiller: toward a robust and effective pandemic surveillance system basedon social media mining.

Philos Trans A Math Phys Eng Sci. 2022 Jan 10;380(2214):20210125. doi: 10.1098/rsta.2021.0125. Epub 2021 Nov 22.

Named Entity Recognition and Relation Detection for Biomedical Information Extraction.

Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.

Constructing knowledge graphs and their biomedical applications.

Comput Struct Biotechnol J. 2020 Jun 2;18:1414-1428. doi: 10.1016/j.csbj.2020.05.017. eCollection 2020.

Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study.

JMIR Med Inform. 2020 May 29;8(5):e17644. doi: 10.2196/17644.

Automated assessment of biological database assertions using the scientific literature.

BMC Bioinformatics. 2019 Apr 29;20(1):216. doi: 10.1186/s12859-019-2801-x.

本文引用的文献

Exploiting syntactic and semantics information for chemical-disease relation extraction.

Database (Oxford). 2016 Apr 14;2016. doi: 10.1093/database/baw048. Print 2016.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

Approximate subgraph matching-based literature mining for biomedical events and relations.

PLoS One. 2013 Apr 17;8(4):e60954. doi: 10.1371/journal.pone.0060954. Print 2013.

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

Understanding PubMed user search behavior through log analysis.

Database (Oxford). 2009;2009:bap018. doi: 10.1093/database/bap018. Epub 2009 Nov 27.

Feature generation and representations for protein-protein interaction classification.

J Biomed Inform. 2009 Oct;42(5):866-72. doi: 10.1016/j.jbi.2009.07.004. Epub 2009 Jul 16.

Comparative analysis of five protein-protein interaction corpora.

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-9-S3-S6.

Manual curation is not sufficient for annotation of genomic databases.

Bioinformatics. 2007 Jul 1;23(13):i41-8. doi: 10.1093/bioinformatics/btm229.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用图核进行高性能生物医学关系提取。

Exploiting graph kernels for high performance biomedical relation extraction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献