Suppr超能文献

从全文中发现提取蛋白质-蛋白质相互作用的模式。

Discovering patterns to extract protein-protein interactions from full texts.

作者信息

Huang Minlie, Zhu Xiaoyan, Hao Yu, Payan Donald G, Qu Kunbin, Li Ming

机构信息

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China.

出版信息

Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29.

Abstract

MOTIVATION

Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts.

RESULTS

We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.

AVAILABILITY

The program is available on request from the authors.

摘要

动机

尽管有几个数据库存储蛋白质-蛋白质相互作用的数据,但大多数此类数据仍仅存在于科学文献中。它们分散在以自然语言撰写的科学文献中,不利于数据挖掘工作。从文献中提取蛋白质途径需要花费大量的时间和精力。我们的目标是开发一种强大而有效的方法,从生物医学文本中挖掘蛋白质-蛋白质相互作用。

结果

我们提出了一种从文献中提取蛋白质-蛋白质相互作用的新颖且强大的方法。我们的方法使用动态规划算法,通过对齐描述蛋白质相互作用的相关句子和关键动词来计算区分模式。设计了一种匹配算法来提取蛋白质之间的相互作用。我们的系统仅配备蛋白质名称词典,召回率达到80.0%,精确率达到80.5%。

可用性

该程序可根据作者要求提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验