Suppr超能文献

从文献中发现用于提取蛋白质-蛋白质相互作用的模式:第二部分。

Discovering patterns to extract protein-protein interactions from the literature: Part II.

作者信息

Hao Yu, Zhu Xiaoyan, Huang Minlie, Li Ming

机构信息

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

出版信息

Bioinformatics. 2005 Aug 1;21(15):3294-300. doi: 10.1093/bioinformatics/bti493. Epub 2005 May 12.

Abstract

MOTIVATION

An enormous number of protein-protein interaction relationships are buried in millions of research articles published over the years, and the number is growing. Rediscovering them automatically is a challenging bioinformatics task. Solutions to this problem also reach far beyond bioinformatics.

RESULTS

We study a new approach that involves automatically discovering English expression patterns, optimizing them and using them to extract protein-protein interactions. In a sister paper, we described how to generate English expression patterns related to protein-protein interactions, and this approach alone has already achieved precision and recall rates significantly higher than those of other automatic systems. This paper continues to present our theory, focusing on how to improve the patterns. A minimum description length (MDL)-based pattern-optimization algorithm is designed to reduce and merge patterns. This has significantly increased generalization power, and hence the recall and precision rates, as confirmed by our experiments.

AVAILABILITY

http://spies.cs.tsinghua.edu.cn.

摘要

动机

多年来发表的数百万篇研究文章中隐藏着大量蛋白质-蛋白质相互作用关系,且这一数量还在不断增加。自动重新发现这些关系是一项具有挑战性的生物信息学任务。该问题的解决方案也远远超出了生物信息学的范畴。

结果

我们研究了一种新方法,该方法涉及自动发现英语表达模式、对其进行优化并利用它们提取蛋白质-蛋白质相互作用。在姊妹论文中,我们描述了如何生成与蛋白质-蛋白质相互作用相关的英语表达模式,仅这一方法就已经实现了比其他自动系统显著更高的精确率和召回率。本文继续阐述我们的理论,重点关注如何改进这些模式。设计了一种基于最小描述长度(MDL)的模式优化算法来减少和合并模式。如我们的实验所证实,这显著提高了泛化能力,从而提高了召回率和精确率。

可用性

http://spies.cs.tsinghua.edu.cn

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验