• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CRAFT语料库中基于金标准本体的解剖学标注

Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.

作者信息

Bada Michael, Vasilevsky Nicole, Baumgartner William A, Haendel Melissa, Hunter Lawrence E

机构信息

School of Medicine, Department of Pharmacology, University of Colorado Anschutz Medical Campus, 12801 E. 17th Ave., P.O. Box 6511, MS 8303, Aurora, CO 80045-0511, USA.

Ontology Development Group, Library, Oregon Health & Science University, 318 SW Sam Jackson, Park Road, Portland, OR 97239, USA.

出版信息

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax087.

DOI:10.1093/database/bax087
PMID:31725864
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7243923/
Abstract

Gold-standard annotated corpora have become important resources for the training and testing of natural-language-processing (NLP) systems designed to support biocuration efforts, and ontologies are increasingly used to facilitate curational consistency and semantic integration across disparate resources. Bringing together the respective power of these, the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of full-length, open-access biomedical journal articles with extensive manually created syntactic, formatting and semantic markup, was previously created and released. This initial public release has already been used in multiple projects to drive development of systems focused on a variety of biocuration, search, visualization, and semantic and syntactic NLP tasks. Building on its demonstrated utility, we have expanded the CRAFT Corpus with a large set of manually created semantic annotations relying on Uberon, an ontology representing anatomical entities and life-cycle stages of multicellular organisms across species as well as types of multicellular organisms defined in terms of life-cycle stage and sexual characteristics. This newly created set of annotations, which has been added for v2.1 of the corpus, is by far the largest publicly available collection of gold-standard anatomical markup and is the first large-scale effort at manual markup of biomedical text relying on the entirety of an anatomical terminology, as opposed to annotation with a small number of high-level anatomical categories, as performed in previous corpora. In addition to presenting and discussing this newly available resource, we apply it to provide a performance baseline for the automatic annotation of anatomical concepts in biomedical text using a prominent concept recognition system. The full corpus, released with a CC BY 3.0 license, may be downloaded from http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. Database URL: http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

摘要

金标准注释语料库已成为用于训练和测试旨在支持生物编目工作的自然语言处理(NLP)系统的重要资源,并且本体越来越多地用于促进不同资源之间的编目一致性和语义整合。汇集这些资源各自的优势,之前创建并发布了科罗拉多丰富注释全文(CRAFT)语料库,这是一个包含全长、开放获取的生物医学期刊文章的集合,带有广泛的人工创建的句法、格式和语义标记。这个初始公开发布版本已在多个项目中用于推动专注于各种生物编目、搜索、可视化以及语义和句法NLP任务的系统开发。基于其已证明的实用性,我们利用Uberon扩展了CRAFT语料库,Uberon是一个本体,代表跨物种多细胞生物体的解剖实体和生命周期阶段以及根据生命周期阶段和性特征定义的多细胞生物体类型。这个新创建的注释集已添加到语料库的v2.1版本中,是目前最大的公开可用的金标准解剖标记集合,并且是首次大规模依靠整个解剖学术语对生物医学文本进行人工标记的努力,这与之前的语料库中使用少量高级解剖类别进行注释不同。除了展示和讨论这个新可用的资源外,我们还将其应用于使用一个著名的概念识别系统为生物医学文本中解剖概念的自动注释提供性能基线。该完整语料库以CC BY 3.0许可发布,可从http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml下载。数据库网址:http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da62/7243923/0b874d0c9982/bax087f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da62/7243923/7b3c9e5b3cfd/bax087f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da62/7243923/0b874d0c9982/bax087f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da62/7243923/7b3c9e5b3cfd/bax087f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da62/7243923/0b874d0c9982/bax087f2.jpg

相似文献

1
Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.CRAFT语料库中基于金标准本体的解剖学标注
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax087.
2
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
3
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
4
Desiderata for ontologies to be used in semantic annotation of biomedical documents.用于生物医学文献语义标注的本体的需求。
J Biomed Inform. 2011 Feb;44(1):94-101. doi: 10.1016/j.jbi.2010.10.002. Epub 2010 Oct 26.
5
A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.语料库全文期刊文章是一种强大的评估工具,可用于揭示生物医学自然语言处理工具性能的差异。
BMC Bioinformatics. 2012 Aug 17;13:207. doi: 10.1186/1471-2105-13-207.
6
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.
7
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.
8
Standardizing Heterogeneous Annotation Corpora Using HL7 FHIR for Facilitating their Reuse and Integration in Clinical NLP.使用HL7 FHIR对异构注释语料库进行标准化,以促进其在临床自然语言处理中的重用和整合。
AMIA Annu Symp Proc. 2018 Dec 5;2018:574-583. eCollection 2018.
9
Web 2.0-based crowdsourcing for high-quality gold standard development in clinical natural language processing.基于Web 2.0的众包方式用于临床自然语言处理中高质量金标准的开发。
J Med Internet Res. 2013 Apr 2;15(4):e73. doi: 10.2196/jmir.2426.
10
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.从与表型特别相关的生物医学文本中生成银标准概念注释。
PLoS One. 2015 Jan 21;10(1):e0116040. doi: 10.1371/journal.pone.0116040. eCollection 2015.

引用本文的文献

1
An extensive review of tools for manual annotation of documents.对文档手动标注工具的全面回顾。
Brief Bioinform. 2021 Jan 18;22(1):146-163. doi: 10.1093/bib/bbz130.

本文引用的文献

1
BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests.BgeeDB,一个用于检索经过整理的表达数据集以及进行基因列表表达定位富集测试的R软件包。
F1000Res. 2016 Nov 23;5:2748. doi: 10.12688/f1000research.9973.2. eCollection 2016.
2
The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species.君主计划:一个跨物种将表型与基因型相联系的综合数据与分析平台。
Nucleic Acids Res. 2017 Jan 4;45(D1):D712-D722. doi: 10.1093/nar/gkw1128. Epub 2016 Nov 29.
3
The Unified Medical Language System.
统一医学语言系统
Yearb Med Inform. 1993(1):41-51. doi: 10.1055/s-0038-1637976.
4
Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.基因本体同义词生成规则可提高生物医学概念识别的性能。
J Biomed Semantics. 2016 Sep 9;7(1):52. doi: 10.1186/s13326-016-0096-7.
5
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.NOBLE——用于大规模生物医学自然语言处理的灵活概念识别
BMC Bioinformatics. 2016 Jan 14;17:32. doi: 10.1186/s12859-015-0871-y.
6
Curatable Named-Entity Recognition Using Semantic Relations.利用语义关系进行可治愈命名实体识别
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):785-92. doi: 10.1109/TCBB.2014.2366770.
7
PKDE4J: Entity and relation extraction for public knowledge discovery.PKDE4J:用于公共知识发现的实体与关系提取
J Biomed Inform. 2015 Oct;57:320-32. doi: 10.1016/j.jbi.2015.08.008. Epub 2015 Aug 12.
8
Using the phenoscape knowledgebase to relate genetic perturbations to phenotypic evolution.利用表型景观知识库将基因扰动与表型进化联系起来。
Genesis. 2015 Aug;53(8):561-71. doi: 10.1002/dvg.22878. Epub 2015 Aug 11.
9
Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.2013年生物自然语言处理共享任务的癌症遗传学与通路注释任务概述。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2105-16-S10-S2. Epub 2015 Jul 13.
10
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.使用GOstruct评估多种文本挖掘特征以进行自动蛋白质功能预测。
J Biomed Semantics. 2015 Mar 18;6:9. doi: 10.1186/s13326-015-0006-4. eCollection 2015.