• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从医学在线数据库中提取调控基因/蛋白质网络。

Extraction of regulatory gene/protein networks from Medline.

作者信息

Saric Jasmin, Jensen Lars Juhl, Ouzounova Rossitza, Rojas Isabel, Bork Peer

机构信息

EML Research gGmbH D-69118 Heidelberg, Germany.

出版信息

Bioinformatics. 2006 Mar 15;22(6):645-50. doi: 10.1093/bioinformatics/bti597. Epub 2005 Jul 26.

DOI:10.1093/bioinformatics/bti597
PMID:16046493
Abstract

MOTIVATION

We have previously developed a rule-based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded for our rule-based system also to extract.

RESULTS

This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to capture both new types of linguistic constructs as well as new types of biological information [i.e. (de-)phosphorylation]. The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3,319 relation chunks. The accuracy is 83-90% and 86-95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech (POS) tagger on the GENIA corpus. For the domain in question, an accuracy of 96.4% was attained on POS tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy.

AVAILABILITY

The revised GENIA corpus, the POS tagger, the extraction rules and the full sets of extracted relations are available from http://www.bork.embl.de/Docu/STRING-IE

摘要

动机

我们之前开发了一种基于规则的方法来提取酵母基因表达调控信息。然而,生物医学文献还包含其他几种同样重要的调控机制的信息,特别是磷酸化,我们现在将基于规则的系统进行扩展,使其也能提取磷酸化信息。

结果

本文展示了从生物医学文本中提取关系信息的新成果。我们改进了系统STRING-IE,以捕获新型语言结构以及新型生物信息[即(去)磷酸化]。精确率保持稳定,召回率略有提高。从与四种模式生物相关的近一百万个PubMed摘要中,我们成功提取了包含3319个关系块的调控网络和二元磷酸化信息。基因表达关系和(去)磷酸化关系的准确率分别为83 - 90%和86 - 95%。为实现这一目标,我们利用了一种特定生物体的基因/蛋白质名称资源,其规模比大多数其他生物相关信息提取方法所使用的资源大得多。在GENIA语料库上重新训练词性(POS)标注器时,这些名称被纳入了词典。对于所讨论的领域,词性标注的准确率达到了96.4%。需要注意的是,这些规则是针对酵母开发的,并成功应用于与其他生物体相关的摘要和全文文章,且准确率相当。

可用性

修订后的GENIA语料库、词性标注器、提取规则以及完整的提取关系集可从http://www.bork.embl.de/Docu/STRING-IE获取。

相似文献

1
Extraction of regulatory gene/protein networks from Medline.从医学在线数据库中提取调控基因/蛋白质网络。
Bioinformatics. 2006 Mar 15;22(6):645-50. doi: 10.1093/bioinformatics/bti597. Epub 2005 Jul 26.
2
Large-scale extraction of gene regulation for model organisms in an ontological context.在本体论背景下对模式生物的基因调控进行大规模提取。
In Silico Biol. 2005;5(1):21-32.
3
Comparison of character-level and part of speech features for name recognition in biomedical texts.生物医学文本中用于名称识别的字符级特征与词性特征比较。
J Biomed Inform. 2004 Dec;37(6):423-35. doi: 10.1016/j.jbi.2004.08.008.
4
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
5
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.使用自动自然语言处理技术对Wnt信号通路进行整理:结合统计方法与部分及完全句法分析进行知识提取。
Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.
6
RelEx--relation extraction using dependency parse trees.RelEx——使用依存句法分析树进行关系抽取。
Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.
7
Combination of text-mining algorithms increases the performance.文本挖掘算法的组合提高了性能。
Bioinformatics. 2006 Sep 1;22(17):2151-7. doi: 10.1093/bioinformatics/btl281. Epub 2006 Jun 9.
8
Gene symbol disambiguation using knowledge-based profiles.使用基于知识的概况进行基因符号消歧。
Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.
9
Gene name identification and normalization using a model organism database.使用模式生物数据库进行基因名称识别与标准化
J Biomed Inform. 2004 Dec;37(6):396-410. doi: 10.1016/j.jbi.2004.08.010.
10
Protein names precisely peeled off free text.蛋白质名称从自由文本中精确提取。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i241-7. doi: 10.1093/bioinformatics/bth904.

引用本文的文献

1
Dead-End protein expression, function, and mutation in cancer: a systematic review.癌症中无义介导的mRNA衰变蛋白的表达、功能及突变:一项系统综述
Mol Biol Rep. 2025 Mar 7;52(1):291. doi: 10.1007/s11033-025-10325-5.
2
Automated recognition of functional compound-protein relationships in literature.文献中功能化合物-蛋白质关系的自动识别。
PLoS One. 2020 Mar 3;15(3):e0220925. doi: 10.1371/journal.pone.0220925. eCollection 2020.
3
Automatic extraction of protein-protein interactions using grammatical relationship graph.基于语法关系图自动提取蛋白质相互作用。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):42. doi: 10.1186/s12911-018-0628-4.
4
Using uncertainty to link and rank evidence from biomedical literature for model curation.利用不确定性将生物医学文献中的证据进行链接和排序,以用于模型编纂。
Bioinformatics. 2017 Dec 1;33(23):3784-3792. doi: 10.1093/bioinformatics/btx466.
5
Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts.通过对MEDLINE摘要进行非负张量分解分析来探索转录因子的功能格局
Front Bioeng Biotechnol. 2017 Aug 28;5:48. doi: 10.3389/fbioe.2017.00048. eCollection 2017.
6
Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.结合命名实体识别和未知词处理的本体事件抽取的主动学习
J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.
7
Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述
Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.
8
RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.RLIMS-P 2.0:一种用于蛋白质磷酸化信息文献挖掘的可通用的基于规则的信息提取系统。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):17-29. doi: 10.1109/TCBB.2014.2372765.
9
Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks.通过异构实体网络从中医文献中提取关系
J Am Med Inform Assoc. 2016 Mar;23(2):356-65. doi: 10.1093/jamia/ocv092. Epub 2015 Jul 29.
10
PALM-IST: Pathway Assembly from Literature Mining--an Information Search Tool.PALM-IST:基于文献挖掘的通路组装——一种信息搜索工具。
Sci Rep. 2015 May 19;5:10021. doi: 10.1038/srep10021.