• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。

BioInfer: a corpus for information extraction in the biomedical domain.

作者信息

Pyysalo Sampo, Ginter Filip, Heimonen Juho, Björne Jari, Boberg Jorma, Järvinen Jouni, Salakoski Tapio

机构信息

Turku Centre for Computer Science (TUCS), University of Turku, Lemminkäisenkatu 14a, 20520 Turku, Finland.

出版信息

BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.

DOI:10.1186/1471-2105-8-50
PMID:17291334
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1808065/
Abstract

BACKGROUND

Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora.

RESULTS

We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation.

CONCLUSION

We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at http://www.it.utu.fi/BioInfer.

摘要

背景

最近,人们对信息提取方法在生物医学领域的应用,尤其是从科学出版物中提取基因、蛋白质和RNA的关系,产生了浓厚的兴趣。此类方法的开发和评估需要带注释的领域语料库。

结果

我们展示了BioInfer(生物信息提取资源),这是一个新的公共资源,提供了一个带注释的生物医学英语语料库。我们描述了一种注释方案,该方案可捕获命名实体及其关系以及句子句法的依存关系分析。我们还展示了定义语料库中注释的实体和关系类型的本体。目前,该语料库包含1100个来自生物医学研究文章摘要的句子,这些句子针对关系、命名实体以及句法依存关系进行了注释。语料库附带了支持软件。该语料库在领域内独一无二,它为一组句子组合了这些注释类型,并且在关系注释的详细程度方面也很独特。

结论

我们引入了一个针对蛋白质、基因和RNA关系的语料库,该语料库可作为信息提取系统及其组件(如解析器和领域分析器)开发的资源。该语料库将得到维护并进一步开发,当前版本可在http://www.it.utu.fi/BioInfer获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/922731cde57c/1471-2105-8-50-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/9b9194a8efdf/1471-2105-8-50-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/49b0d1bd6c0a/1471-2105-8-50-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/271d921ee65b/1471-2105-8-50-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/a58f42f9758e/1471-2105-8-50-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/65cefcdf7402/1471-2105-8-50-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/6c3eef574eed/1471-2105-8-50-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/0ca5ec8c3dec/1471-2105-8-50-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/45085bb83fc0/1471-2105-8-50-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/91e2438a5973/1471-2105-8-50-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/0b69f5d8d75e/1471-2105-8-50-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/922731cde57c/1471-2105-8-50-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/9b9194a8efdf/1471-2105-8-50-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/49b0d1bd6c0a/1471-2105-8-50-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/271d921ee65b/1471-2105-8-50-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/a58f42f9758e/1471-2105-8-50-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/65cefcdf7402/1471-2105-8-50-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/6c3eef574eed/1471-2105-8-50-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/0ca5ec8c3dec/1471-2105-8-50-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/45085bb83fc0/1471-2105-8-50-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/91e2438a5973/1471-2105-8-50-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/0b69f5d8d75e/1471-2105-8-50-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/698f/1808065/922731cde57c/1471-2105-8-50-11.jpg

相似文献

1
BioInfer: a corpus for information extraction in the biomedical domain.生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。
BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.
2
Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。
BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.
3
Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.针对蛋白质-蛋白质相互作用的生物医学语料库对两种依存句法分析器的评估。
Int J Med Inform. 2006 Jun;75(6):430-42. doi: 10.1016/j.ijmedinf.2005.06.009. Epub 2005 Aug 11.
4
Construction of an annotated corpus to support biomedical information extraction.构建带注释语料库以支持生物医学信息抽取。
BMC Bioinformatics. 2009 Oct 23;10:349. doi: 10.1186/1471-2105-10-349.
5
BioIE: extracting informative sentences from the biomedical literature.生物信息抽取:从生物医学文献中提取信息性句子。
Bioinformatics. 2005 May 1;21(9):2138-9. doi: 10.1093/bioinformatics/bti296. Epub 2005 Feb 2.
6
GENIA corpus--semantically annotated corpus for bio-textmining.GENIA语料库——用于生物文本挖掘的语义标注语料库。
Bioinformatics. 2003;19 Suppl 1:i180-2. doi: 10.1093/bioinformatics/btg1023.
7
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.使用自动自然语言处理技术对Wnt信号通路进行整理:结合统计方法与部分及完全句法分析进行知识提取。
Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.
8
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
9
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
10
An environment for relation mining over richly annotated corpora: the case of GENIA.一个用于在大量注释语料库上进行关系挖掘的环境:以GENIA语料库为例。
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-7-S3-S3.

引用本文的文献

1
Reduction of supervision for biomedical knowledge discovery.减少对生物医学知识发现的监督。
BMC Bioinformatics. 2025 Sep 1;26(1):225. doi: 10.1186/s12859-025-06187-0.
2
Machine learning to predict penumbra core mismatch in acute ischemic stroke using clinical note data.利用临床记录数据,通过机器学习预测急性缺血性卒中的半暗带核心不匹配情况。
NPJ Digit Med. 2025 Jun 6;8(1):340. doi: 10.1038/s41746-025-01703-1.
3
The influence of prompt engineering on large language models for protein-protein interaction identification in biomedical literature.

本文引用的文献

1
Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches.将链接语法进行词汇调整以适应生物医学子语言:三种方法的比较评估。
BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-7-S3-S2.
2
Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.针对蛋白质-蛋白质相互作用的生物医学语料库对两种依存句法分析器的评估。
Int J Med Inform. 2006 Jun;75(6):430-42. doi: 10.1016/j.ijmedinf.2005.06.009. Epub 2005 Aug 11.
3
Agreement, the f-measure, and reliability in information retrieval.
提示工程对生物医学文献中蛋白质-蛋白质相互作用识别的大语言模型的影响。
Sci Rep. 2025 May 3;15(1):15493. doi: 10.1038/s41598-025-99290-4.
4
DiMB-RE: mining the scientific literature for diet-microbiome associations.DiMB-RE:挖掘科学文献以寻找饮食与微生物组的关联。
J Am Med Inform Assoc. 2025 Jun 1;32(6):998-1006. doi: 10.1093/jamia/ocaf054.
5
Annotated corpus for traditional formula-disease relationships in biomedical articles.生物医学文章中传统方剂 - 疾病关系的注释语料库。
Sci Data. 2025 Jan 7;12(1):26. doi: 10.1038/s41597-025-04377-2.
6
JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.JTIS:通过中间步骤的联合训练增强生物医学文档级关系抽取
Database (Oxford). 2024 Dec 19;2024. doi: 10.1093/database/baae125.
7
CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.CoNECo:一个用于蛋白质复合物命名实体识别和规范化的语料库。
Bioinform Adv. 2024 Aug 20;4(1):vbae116. doi: 10.1093/bioadv/vbae116. eCollection 2024.
8
Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini.基于 GPT-4 和 Gemini 的生物医学实体关系抽取数据增强与分类优化方法
Database (Oxford). 2024 Oct 9;2024. doi: 10.1093/database/baae104.
9
Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.评估GPT和BERT模型用于生物医学文本中蛋白质-蛋白质相互作用的识别
Bioinform Adv. 2024 Sep 11;4(1):vbae133. doi: 10.1093/bioadv/vbae133. eCollection 2024.
10
STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.从生物医学文献中提取物理蛋白质相互作用的语料库和方法:将蛋白质复合物串联起来。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae552.
信息检索中的一致性、F值与可靠性。
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.
4
PASBio: predicate-argument structures for event extraction in molecular biology.PASBio:用于分子生物学事件提取的谓词-论元结构
BMC Bioinformatics. 2004 Oct 19;5:155. doi: 10.1186/1471-2105-5-155.
5
Extracting human protein interactions from MEDLINE using a full-sentence parser.使用全句解析器从MEDLINE中提取人类蛋白质相互作用。
Bioinformatics. 2004 Mar 22;20(5):604-11. doi: 10.1093/bioinformatics/btg452. Epub 2004 Jan 22.
6
Mining the biomedical literature in the genomic era: an overview.基因组时代的生物医学文献挖掘:综述
J Comput Biol. 2003;10(6):821-55. doi: 10.1089/106652703322756104.
7
Adding a medical lexicon to an English Parser.为英语解析器添加医学词汇表。
AMIA Annu Symp Proc. 2003;2003:639-43.
8
The Database of Interacting Proteins: 2004 update.相互作用蛋白质数据库:2004年更新版。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D449-51. doi: 10.1093/nar/gkh086.
9
Extraction of protein interaction information from unstructured text using a context-free grammar.使用上下文无关语法从非结构化文本中提取蛋白质相互作用信息。
Bioinformatics. 2003 Nov 1;19(16):2046-53. doi: 10.1093/bioinformatics/btg279.
10
Accomplishments and challenges in literature data mining for biology.生物学文献数据挖掘中的成就与挑战
Bioinformatics. 2002 Dec;18(12):1553-61. doi: 10.1093/bioinformatics/18.12.1553.