Suppr超能文献

使用多个语义内核进行蛋白质相互作用句子检测。

Protein interaction sentence detection using multiple semantic kernels.

作者信息

Polajnar Tamara, Damoulas Theodoros, Girolami Mark

机构信息

School of Computing Science, University of Glasgow, Glasgow, UK.

出版信息

J Biomed Semantics. 2011 May 14;2(1):1. doi: 10.1186/2041-1480-2-1.

Abstract

BACKGROUND

Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Many state-of-the-art approaches for this task employ kernel classification methods, in particular support vector machines (SVMs). In this work we propose a novel data integration approach that utilises semantic kernels and a kernel classification method that is a probabilistic analogue to SVMs. Semantic kernels are created from statistical information gathered from large amounts of unlabelled text using lexical semantic models. Several semantic kernels are then fused into an overall composite classification space. In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection.

RESULTS

We show that combinations of semantic kernels lead to statistically significant improvements in recognition rates and receiver operating characteristic (ROC) scores over the plain Gaussian kernel, when applied to a well-known labelled collection of abstracts. The proposed kernel composition method also allows us to automatically infer the most discriminative kernels.

CONCLUSIONS

The results from this paper indicate that using semantic information from unlabelled text, and combinations of such information, can be valuable for classification of short texts such as PPI sentences. This study, however, is only a first step in evaluation of semantic kernels and probabilistic multiple kernel learning in the context of PPI detection. The method described herein is modular, and can be applied with a variety of feature types, kernels, and semantic models, in order to facilitate full extraction of interacting proteins.

摘要

背景

在生物医学文献中检测描述蛋白质 - 蛋白质相互作用(PPI)的句子是一个具有挑战性且尚未解决的模式识别问题。许多针对此任务的先进方法采用核分类方法,特别是支持向量机(SVM)。在这项工作中,我们提出了一种新颖的数据集成方法,该方法利用语义核以及一种与SVM类似的概率核分类方法。语义核是根据使用词汇语义模型从大量未标记文本中收集的统计信息创建的。然后将几个语义核融合到一个整体的复合分类空间中。在这项初步研究中,我们使用简单特征来检验使用基于词的语义模型构建的核的组合是否可以提高PPI句子检测的效果。

结果

我们表明,当应用于一个著名的带标记摘要集合时,语义核的组合在识别率和接收器操作特征(ROC)分数方面比普通高斯核有统计学上的显著提高。所提出的核组合方法还使我们能够自动推断出最具判别力的核。

结论

本文的结果表明,使用来自未标记文本的语义信息以及此类信息的组合对于诸如PPI句子等短文本的分类可能是有价值的。然而,这项研究只是在PPI检测背景下评估语义核和概率多核学习的第一步。本文描述的方法是模块化的,并且可以与各种特征类型、核和语义模型一起应用,以便于充分提取相互作用的蛋白质。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fced/3116455/69ff82ab2838/2041-1480-2-1-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验