Huang Minlie, Zhu Xiaoyan, Hao Yu, Payan Donald G, Qu Kunbin, Li Ming
State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China.
Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29.
Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts.
We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.
The program is available on request from the authors.
尽管有几个数据库存储蛋白质-蛋白质相互作用的数据,但大多数此类数据仍仅存在于科学文献中。它们分散在以自然语言撰写的科学文献中,不利于数据挖掘工作。从文献中提取蛋白质途径需要花费大量的时间和精力。我们的目标是开发一种强大而有效的方法,从生物医学文本中挖掘蛋白质-蛋白质相互作用。
我们提出了一种从文献中提取蛋白质-蛋白质相互作用的新颖且强大的方法。我们的方法使用动态规划算法,通过对齐描述蛋白质相互作用的相关句子和关键动词来计算区分模式。设计了一种匹配算法来提取蛋白质之间的相互作用。我们的系统仅配备蛋白质名称词典,召回率达到80.0%,精确率达到80.5%。
该程序可根据作者要求提供。