使用多个语义内核进行蛋白质相互作用句子检测。

Protein interaction sentence detection using multiple semantic kernels.

作者信息

Polajnar Tamara, Damoulas Theodoros, Girolami Mark

机构信息

School of Computing Science, University of Glasgow, Glasgow, UK.

出版信息

J Biomed Semantics. 2011 May 14;2(1):1. doi: 10.1186/2041-1480-2-1.

DOI:10.1186/2041-1480-2-1

PMID:21569604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3116455/

Abstract

BACKGROUND

Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Many state-of-the-art approaches for this task employ kernel classification methods, in particular support vector machines (SVMs). In this work we propose a novel data integration approach that utilises semantic kernels and a kernel classification method that is a probabilistic analogue to SVMs. Semantic kernels are created from statistical information gathered from large amounts of unlabelled text using lexical semantic models. Several semantic kernels are then fused into an overall composite classification space. In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection.

RESULTS

We show that combinations of semantic kernels lead to statistically significant improvements in recognition rates and receiver operating characteristic (ROC) scores over the plain Gaussian kernel, when applied to a well-known labelled collection of abstracts. The proposed kernel composition method also allows us to automatically infer the most discriminative kernels.

CONCLUSIONS

The results from this paper indicate that using semantic information from unlabelled text, and combinations of such information, can be valuable for classification of short texts such as PPI sentences. This study, however, is only a first step in evaluation of semantic kernels and probabilistic multiple kernel learning in the context of PPI detection. The method described herein is modular, and can be applied with a variety of feature types, kernels, and semantic models, in order to facilitate full extraction of interacting proteins.

摘要

背景

在生物医学文献中检测描述蛋白质 - 蛋白质相互作用（PPI）的句子是一个具有挑战性且尚未解决的模式识别问题。许多针对此任务的先进方法采用核分类方法，特别是支持向量机（SVM）。在这项工作中，我们提出了一种新颖的数据集成方法，该方法利用语义核以及一种与SVM类似的概率核分类方法。语义核是根据使用词汇语义模型从大量未标记文本中收集的统计信息创建的。然后将几个语义核融合到一个整体的复合分类空间中。在这项初步研究中，我们使用简单特征来检验使用基于词的语义模型构建的核的组合是否可以提高PPI句子检测的效果。

结果

我们表明，当应用于一个著名的带标记摘要集合时，语义核的组合在识别率和接收器操作特征（ROC）分数方面比普通高斯核有统计学上的显著提高。所提出的核组合方法还使我们能够自动推断出最具判别力的核。

结论

本文的结果表明，使用来自未标记文本的语义信息以及此类信息的组合对于诸如PPI句子等短文本的分类可能是有价值的。然而，这项研究只是在PPI检测背景下评估语义核和概率多核学习的第一步。本文描述的方法是模块化的，并且可以与各种特征类型、核和语义模型一起应用，以便于充分提取相互作用的蛋白质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fced/3116455/69ff82ab2838/2041-1480-2-1-1.jpg

相似文献

Protein interaction sentence detection using multiple semantic kernels.使用多个语义内核进行蛋白质相互作用句子检测。

J Biomed Semantics. 2011 May 14;2(1):1. doi: 10.1186/2041-1480-2-1.

Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

Integrating semantic information into multiple kernels for protein-protein interaction extraction from biomedical literatures.将语义信息整合到多个内核中以从生物医学文献中提取蛋白质-蛋白质相互作用。

PLoS One. 2014 Mar 12;9(3):e91898. doi: 10.1371/journal.pone.0091898. eCollection 2014.

Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.用于从生物医学文献中提取蛋白质-蛋白质相互作用的分布式平滑树核

PLoS One. 2017 Nov 3;12(11):e0187379. doi: 10.1371/journal.pone.0187379. eCollection 2017.

Nonlinear Deep Kernel Learning for Image Annotation.用于图像标注的非线性深度核学习

IEEE Trans Image Process. 2017 Apr;26(4):1820-1832. doi: 10.1109/TIP.2017.2666038. Epub 2017 Feb 8.

Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.利用句法和语义图核从生物医学文献中提取药代动力学药物相互作用。

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):67. doi: 10.1186/s12918-016-0311-2.

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.从文献中提取蛋白质-蛋白质相互作用的核方法综合基准测试

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

Context-dependent kernels for object classification.基于上下文的目标分类核函数。

IEEE Trans Pattern Anal Mach Intell. 2011 Apr;33(4):699-708. doi: 10.1109/TPAMI.2010.198.

Efficient classification for additive kernel SVMs.加法核支持向量机的高效分类。

IEEE Trans Pattern Anal Mach Intell. 2013 Jan;35(1):66-77. doi: 10.1109/TPAMI.2012.62.

Walk-weighted subsequence kernels for protein-protein interaction extraction.基于行走权重的蛋白质相互作用提取子序列核方法。

BMC Bioinformatics. 2010 Feb 25;11:107. doi: 10.1186/1471-2105-11-107.

引用本文的文献

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.用于从生物医学文本中发现基因相互作用及其上下文信息的序列模式挖掘

J Biomed Semantics. 2015 May 18;6:27. doi: 10.1186/s13326-015-0023-3. eCollection 2015.

Automatic extraction of biomolecular interactions: an empirical approach.生物分子相互作用的自动提取：一种经验方法。

BMC Bioinformatics. 2013 Jul 24;14:234. doi: 10.1186/1471-2105-14-234.

Mol Cell Proteomics. 2013 Jan;12(1):1-13. doi: 10.1074/mcp.R112.019554. Epub 2012 Oct 15.

Extraction of data deposition statements from the literature: a method for automatically tracking research results.从文献中提取数据提交声明：一种自动跟踪研究结果的方法。

Bioinformatics. 2011 Dec 1;27(23):3306-12. doi: 10.1093/bioinformatics/btr573. Epub 2011 Oct 13.

本文引用的文献

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.从文献中提取蛋白质-蛋白质相互作用的核方法综合基准测试

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

KEGG for representation and analysis of molecular networks involving diseases and drugs.KEGG 用于表示和分析涉及疾病和药物的分子网络。

Nucleic Acids Res. 2010 Jan;38(Database issue):D355-60. doi: 10.1093/nar/gkp896. Epub 2009 Oct 30.

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.用于蛋白质-蛋白质相互作用提取的全路径图核以及跨语料库学习评估

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-9-S11-S2.

Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

Comparative analysis of five protein-protein interaction corpora.五个蛋白质-蛋白质相互作用语料库的比较分析。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-9-S3-S6.

Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习：用于蛋白质折叠识别和远程同源性检测

Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.

Assisted curation: does text mining really help?辅助编目：文本挖掘真的有帮助吗？

Pac Symp Biocomput. 2008:556-67.

Benchmarking natural-language parsers for biological applications using dependency graphs.使用依存关系图对生物应用中的自然语言解析器进行基准测试。

BMC Bioinformatics. 2007 Jan 25;8:24. doi: 10.1186/1471-2105-8-24.

EBIMed--text crunching to gather facts for proteins from Medline.EBIMed——通过文本处理从医学在线数据库中收集蛋白质相关事实。

Bioinformatics. 2007 Jan 15;23(2):e237-44. doi: 10.1093/bioinformatics/btl302.

Representing word meaning and order information in a composite holographic lexicon.在复合全息词典中表示词义和顺序信息。

Psychol Rev. 2007 Jan;114(1):1-37. doi: 10.1037/0033-295X.114.1.1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多个语义内核进行蛋白质相互作用句子检测。

Protein interaction sentence detection using multiple semantic kernels.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献