文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

利用图核进行高性能生物医学关系提取。

Exploiting graph kernels for high performance biomedical relation extraction.

作者信息

Panyam Nagesh C, Verspoor Karin, Cohn Trevor, Ramamohanarao Kotagiri

机构信息

School of Computing and Information Systems, University of Melbourne, Melbourne, Australia.

出版信息

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.


DOI:10.1186/s13326-017-0168-3
PMID:29382397
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5791373/
Abstract

BACKGROUND: Relation extraction from biomedical publications is an important task in the area of semantic mining of text. Kernel methods for supervised relation extraction are often preferred over manual feature engineering methods, when classifying highly ordered structures such as trees and graphs obtained from syntactic parsing of a sentence. Tree kernels such as the Subset Tree Kernel and Partial Tree Kernel have been shown to be effective for classifying constituency parse trees and basic dependency parse graphs of a sentence. Graph kernels such as the All Path Graph kernel (APG) and Approximate Subgraph Matching (ASM) kernel have been shown to be suitable for classifying general graphs with cycles, such as the enhanced dependency parse graph of a sentence. In this work, we present a high performance Chemical-Induced Disease (CID) relation extraction system. We present a comparative study of kernel methods for the CID task and also extend our study to the Protein-Protein Interaction (PPI) extraction task, an important biomedical relation extraction task. We discuss novel modifications to the ASM kernel to boost its performance and a method to apply graph kernels for extracting relations expressed in multiple sentences. RESULTS: Our system for CID relation extraction attains an F-score of 60%, without using external knowledge sources or task specific heuristic or rules. In comparison, the state of the art Chemical-Disease Relation Extraction system achieves an F-score of 56% using an ensemble of multiple machine learning methods, which is then boosted to 61% with a rule based system employing task specific post processing rules. For the CID task, graph kernels outperform tree kernels substantially, and the best performance is obtained with APG kernel that attains an F-score of 60%, followed by the ASM kernel at 57%. The performance difference between the ASM and APG kernels for CID sentence level relation extraction is not significant. In our evaluation of ASM for the PPI task, ASM performed better than APG kernel for the BioInfer dataset, in the Area Under Curve (AUC) measure (74% vs 69%). However, for all the other PPI datasets, namely AIMed, HPRD50, IEPA and LLL, ASM is substantially outperformed by the APG kernel in F-score and AUC measures. CONCLUSIONS: We demonstrate a high performance Chemical Induced Disease relation extraction, without employing external knowledge sources or task specific heuristics. Our work shows that graph kernels are effective in extracting relations that are expressed in multiple sentences. We also show that the graph kernels, namely the ASM and APG kernels, substantially outperform the tree kernels. Among the graph kernels, we showed the ASM kernel as effective for biomedical relation extraction, with comparable performance to the APG kernel for datasets such as the CID-sentence level relation extraction and BioInfer in PPI. Overall, the APG kernel is shown to be significantly more accurate than the ASM kernel, achieving better performance on most datasets.

摘要

背景:从生物医学出版物中提取关系是文本语义挖掘领域的一项重要任务。在对从句子句法分析中获得的诸如树和图等高阶结构进行分类时,用于监督关系提取的核方法通常比手动特征工程方法更受青睐。诸如子集树核和部分树核等树核已被证明对句子的成分分析树和基本依存关系分析图的分类有效。诸如全路径图核(APG)和近似子图匹配(ASM)核等图核已被证明适用于对具有循环的一般图进行分类,例如句子的增强依存关系分析图。在这项工作中,我们提出了一个高性能的化学诱导疾病(CID)关系提取系统。我们对CID任务的核方法进行了比较研究,并将我们的研究扩展到蛋白质 - 蛋白质相互作用(PPI)提取任务,这是一项重要的生物医学关系提取任务。我们讨论了对ASM核的新颖修改以提高其性能,以及一种应用图核来提取多句中表达的关系的方法。 结果:我们的CID关系提取系统在不使用外部知识源或特定任务启发式方法或规则的情况下,F值达到了60%。相比之下,当前最先进的化学 - 疾病关系提取系统使用多种机器学习方法的集成获得了56%的F值,然后通过采用特定任务后处理规则的基于规则的系统将其提高到61%。对于CID任务,图核在很大程度上优于树核,使用APG核获得了最佳性能,F值达到60%,其次是ASM核,为57%。ASM和APG核在CID句子级关系提取中的性能差异不显著。在我们对PPI任务的ASM评估中,在曲线下面积(AUC)度量方面,ASM在BioInfer数据集上的表现优于APG核(74%对69%)。然而,对于所有其他PPI数据集,即AIMed、HPRD50、IEPA和LLL,在F值和AUC度量方面,ASM明显不如APG核。 结论:我们展示了一种高性能的化学诱导疾病关系提取方法,无需使用外部知识源或特定任务启发式方法。我们的工作表明图核在提取多句中表达的关系方面是有效的。我们还表明,图核,即ASM和APG核,在很大程度上优于树核。在图核中,我们表明ASM核对于生物医学关系提取是有效的,在诸如CID句子级关系提取和PPI中的BioInfer等数据集上与APG核具有可比的性能。总体而言,APG核在大多数数据集上表现出比ASM核显著更准确的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9a5/5791373/f430d22c60e7/13326_2017_168_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9a5/5791373/f430d22c60e7/13326_2017_168_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9a5/5791373/f430d22c60e7/13326_2017_168_Fig1_HTML.jpg

相似文献

[1]
Exploiting graph kernels for high performance biomedical relation extraction.

J Biomed Semantics. 2018-1-30

[2]
Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

PLoS One. 2017-11-3

[3]
Hash subgraph pairwise kernel for protein-protein interaction extraction.

IEEE/ACM Trans Comput Biol Bioinform. 2012

[4]
Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.

BMC Syst Biol. 2016-8-26

[5]
Ranking support vector machine for multiple kernels output combination in protein-protein interaction extraction from biomedical literature.

Proteomics. 2011-9-2

[6]
Multiple kernels learning-based biological entity relationship extraction method.

J Biomed Semantics. 2017-9-20

[7]
A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol. 2010-7-1

[8]
All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

BMC Bioinformatics. 2008-11-19

[9]
Extracting biomedical relation from cross-sentence text using syntactic dependency graph attention network.

J Biomed Inform. 2023-8

[10]
Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation.

JMIR Med Inform. 2020-7-31

引用本文的文献

[1]
Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction.

Eur J Med Res. 2024-8-2

[2]
Large language model based framework for automated extraction of genetic interactions from unstructured data.

PLoS One. 2024

[3]
Exploiting document graphs for inter sentence relation extraction.

J Biomed Semantics. 2022-6-3

[4]
A cancer graph: a lung cancer property graph database in Neo4j.

BMC Res Notes. 2022-2-14

[5]
Biomedical relation extraction via knowledge-enhanced reading comprehension.

BMC Bioinformatics. 2022-1-6

[6]
COVID-19 Surveiller: toward a robust and effective pandemic surveillance system basedon social media mining.

Philos Trans A Math Phys Eng Sci. 2022-1-10

[7]
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.

Front Cell Dev Biol. 2020-8-28

[8]
Constructing knowledge graphs and their biomedical applications.

Comput Struct Biotechnol J. 2020-6-2

[9]
Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study.

JMIR Med Inform. 2020-5-29

[10]
Automated assessment of biological database assertions using the scientific literature.

BMC Bioinformatics. 2019-4-29

本文引用的文献

[1]
Exploiting syntactic and semantics information for chemical-disease relation extraction.

Database (Oxford). 2016-4-14

[2]
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Database (Oxford). 2016-3-19

[3]
Approximate subgraph matching-based literature mining for biomedical events and relations.

PLoS One. 2013-4-17

[4]
A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol. 2010-7-1

[5]
Understanding PubMed user search behavior through log analysis.

Database (Oxford). 2009

[6]
Feature generation and representations for protein-protein interaction classification.

J Biomed Inform. 2009-7-16

[7]
Comparative analysis of five protein-protein interaction corpora.

BMC Bioinformatics. 2008-4-11

[8]
Manual curation is not sufficient for annotation of genomic databases.

Bioinformatics. 2007-7-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索