从全文中发现提取蛋白质-蛋白质相互作用的模式。

Discovering patterns to extract protein-protein interactions from full texts.

作者信息

Huang Minlie, Zhu Xiaoyan, Hao Yu, Payan Donald G, Qu Kunbin, Li Ming

机构信息

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China.

出版信息

Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29.

DOI:10.1093/bioinformatics/bth451

PMID:15284092

Abstract

MOTIVATION

Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts.

RESULTS

We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.

AVAILABILITY

The program is available on request from the authors.

摘要

动机

尽管有几个数据库存储蛋白质-蛋白质相互作用的数据，但大多数此类数据仍仅存在于科学文献中。它们分散在以自然语言撰写的科学文献中，不利于数据挖掘工作。从文献中提取蛋白质途径需要花费大量的时间和精力。我们的目标是开发一种强大而有效的方法，从生物医学文本中挖掘蛋白质-蛋白质相互作用。

结果

我们提出了一种从文献中提取蛋白质-蛋白质相互作用的新颖且强大的方法。我们的方法使用动态规划算法，通过对齐描述蛋白质相互作用的相关句子和关键动词来计算区分模式。设计了一种匹配算法来提取蛋白质之间的相互作用。我们的系统仅配备蛋白质名称词典，召回率达到80.0%，精确率达到80.5%。

可用性

该程序可根据作者要求提供。

相似文献

Discovering patterns to extract protein-protein interactions from full texts.

Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29.

Discovering patterns to extract protein-protein interactions from the literature: Part II.

Bioinformatics. 2005 Aug 1;21(15):3294-300. doi: 10.1093/bioinformatics/bti493. Epub 2005 May 12.

Extracting human protein interactions from MEDLINE using a full-sentence parser.

Bioinformatics. 2004 Mar 22;20(5):604-11. doi: 10.1093/bioinformatics/btg452. Epub 2004 Jan 22.

A hybrid method for relation extraction from biomedical literature.

Int J Med Inform. 2006 Jun;75(6):443-55. doi: 10.1016/j.ijmedinf.2005.06.010. Epub 2005 Aug 10.

Recognizing names in biomedical texts: a machine learning approach.

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

Extracting interactions between proteins from the literature.

J Biomed Inform. 2008 Apr;41(2):393-407. doi: 10.1016/j.jbi.2007.11.008. Epub 2007 Dec 15.

Finding the evidence for protein-protein interactions from PubMed abstracts.

Bioinformatics. 2006 Jul 15;22(14):e220-6. doi: 10.1093/bioinformatics/btl203.

Annotating proteins by mining protein interaction networks.

Bioinformatics. 2006 Jul 15;22(14):e260-70. doi: 10.1093/bioinformatics/btl221.

BioContrasts: extracting and exploiting protein-protein contrastive relations from biomedical literature.

Bioinformatics. 2006 Mar 1;22(5):597-605. doi: 10.1093/bioinformatics/btk016. Epub 2005 Dec 20.

Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes.

Bioinformatics. 2005 May 1;21(9):2049-58. doi: 10.1093/bioinformatics/bti268. Epub 2005 Jan 18.

引用本文的文献

Reduction of supervision for biomedical knowledge discovery.

BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.

The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

Sci Rep. 2025 May 3;15(1):15493. doi: 10.1038/s41598-025-99290-4.

Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae132.

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.

ArXiv. 2023 Jun 19:arXiv:2306.11189v1.

Automated extraction of genes associated with antibiotic resistance from the biomedical literature.

Database (Oxford). 2022 Jan 29;2022(2022). doi: 10.1093/database/baab077.

Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.

BMC Bioinformatics. 2021 Oct 16;22(1):500. doi: 10.1186/s12859-021-04421-z.

Text mining for modeling of protein complexes enhanced by machine learning.

Bioinformatics. 2021 May 1;37(4):497-505. doi: 10.1093/bioinformatics/btaa823.

Recent advances in biomedical literature mining.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.

Automatic extraction of protein-protein interactions using grammatical relationship graph.

BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):42. doi: 10.1186/s12911-018-0628-4.

Relation extraction for biological pathway construction using node2vec.

BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):206. doi: 10.1186/s12859-018-2200-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从全文中发现提取蛋白质-蛋白质相互作用的模式。

Discovering patterns to extract protein-protein interactions from full texts.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献