• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从与表型特别相关的生物医学文本中生成银标准概念注释。

Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

作者信息

Oellrich Anika, Collier Nigel, Smedley Damian, Groza Tudor

机构信息

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom.

EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, United Kingdom; National Institute of Informatics, Tokyo 101-8430, Japan.

出版信息

PLoS One. 2015 Jan 21;10(1):e0116040. doi: 10.1371/journal.pone.0116040. eCollection 2015.

DOI:10.1371/journal.pone.0116040
PMID:25607983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4301805/
Abstract

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.

摘要

电子健康记录和科学文章具有不同的语言特征,这可能会影响为其中一方开发的自然语言处理工具的性能。在本文中,我们研究了四种现有概念识别工具的性能:临床文本分析与知识提取系统(cTAKES)、国家生物医学本体中心(NCBO)注释器、生物医学概念注释系统(BeCAS)和MetaMap。这四种概念识别系统分别应用于四个不同的语料库:临床文档的i2b2语料库、Medline摘要的PubMed语料库、临床试验语料库和ShARe/CLEF语料库。此外,我们针对一个可用于ShARe/CLEF语料库的黄金标准注释集评估各个系统的性能。此外,我们根据各个系统的输出构建了一个银标准注释集,并评估其质量以及各个系统对银标准质量的贡献。我们的结果表明,主要是NCBO注释器和cTAKES对银标准语料库有贡献(F1值在21%至74%之间)及其质量(最佳F1值为33%),与所研究文本的类型无关。虽然BeCAS和MetaMap可以提高银标准注释的精度(精度高达42%),但由于召回率较低,与NCBO注释器和cTAKES结合时F1值会下降。总之,需要独立于文本类型来提高各个系统的性能,并且需要修订最佳利用各个系统注释的利用策略。PubMed语料库的文本内容、临床试验语料库的 accession编号以及四个概念识别系统的指定注释以及生成的银标准注释集可从http://purl.org/phenotype/resources获取。ShARe/CLEF(https://sites.google.com/site/shareclefehealth/data)和i2b2(https://i2b2.org/NLP/DataSets/)语料库的文本内容需要向各个语料库提供者索取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/8990b9999358/pone.0116040.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/870966729edc/pone.0116040.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/2060bc0ece7a/pone.0116040.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/8990b9999358/pone.0116040.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/870966729edc/pone.0116040.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/2060bc0ece7a/pone.0116040.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156a/4301805/8990b9999358/pone.0116040.g003.jpg

相似文献

1
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.从与表型特别相关的生物医学文本中生成银标准概念注释。
PLoS One. 2015 Jan 21;10(1):e0116040. doi: 10.1371/journal.pone.0116040. eCollection 2015.
2
RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD:一款兼顾速度与准确性的生物医学语义注释工具。
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.
3
SIFR annotator: ontology-based semantic annotation of French biomedical text and clinical notes.SIFR 标注器:基于本体论的法语生物医学文本和临床笔记的语义标注。
BMC Bioinformatics. 2018 Nov 6;19(1):405. doi: 10.1186/s12859-018-2429-2.
4
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
5
Automatic concept recognition using the human phenotype ontology reference and test suite corpora.使用人类表型本体参考和测试套件语料库进行自动概念识别。
Database (Oxford). 2015 Feb 27;2015. doi: 10.1093/database/bav005. Print 2015.
6
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
7
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus.使用文本挖掘技术从PhenoCHF语料库中提取表型信息。
BMC Med Inform Decis Mak. 2015;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1472-6947-15-S2-S3. Epub 2015 Jun 15.
8
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
9
Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.利用 NCBO Annotator 增强临床文本的标注和索引功能。
Bioinformatics. 2018 Jun 1;34(11):1962-1965. doi: 10.1093/bioinformatics/bty009.
10
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.

引用本文的文献

1
Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.自然语言处理算法在将临床文本片段映射到本体概念上的应用:系统评价及对未来研究的建议。
J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z.
2
BioHackathon 2015: Semantics of data for life sciences and reproducible research.2015 年生物黑客马拉松:生命科学和可重复研究的数据语义学。
F1000Res. 2020 Feb 24;9:136. doi: 10.12688/f1000research.18236.1. eCollection 2020.
3
Annotating and detecting phenotypic information for chronic obstructive pulmonary disease.

本文引用的文献

1
Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters.大规模生物医学概念识别:当前自动标注器及其参数的评估。
BMC Bioinformatics. 2014 Feb 26;15:59. doi: 10.1186/1471-2105-15-59.
2
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.根据基因/蛋白质标记解决方案和词汇资源评估金标准语料库。
J Biomed Semantics. 2013 Oct 11;4(1):28. doi: 10.1186/2041-1480-4-28.
3
Text mining for systems biology.用于系统生物学的文本挖掘
标注与检测慢性阻塞性肺疾病的表型信息。
JAMIA Open. 2019 Apr 26;2(2):261-271. doi: 10.1093/jamiaopen/ooz009. eCollection 2019 Jul.
4
Doc2Hpo: a web application for efficient and accurate HPO concept curation.Doc2Hpo:一个用于高效准确的 HPO 概念编纂的网络应用程序。
Nucleic Acids Res. 2019 Jul 2;47(W1):W566-W570. doi: 10.1093/nar/gkz386.
5
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.使用本体论对表型进行注释:自然语言处理系统的培训和评估的黄金标准。
Database (Oxford). 2018 Jan 1;2018:bay110. doi: 10.1093/database/bay110.
6
PheKnow-Cloud: A Tool for Evaluating High-Throughput Phenotype Candidates using Online Medical Literature.PheKnow-Cloud:一种利用在线医学文献评估高通量表型候选物的工具。
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:149-157. eCollection 2017.
7
Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.将异构文本源中的表型信息映射到特定领域的术语资源。
PLoS One. 2016 Sep 19;11(9):e0162287. doi: 10.1371/journal.pone.0162287. eCollection 2016.
8
Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.使用增强型词典和查询扩展改进疾病规范化的词典查找方法。
Database (Oxford). 2016 Aug 7;2016. doi: 10.1093/database/baw112. Print 2016.
9
Concept selection for phenotypes and diseases using learn to rank.使用排序学习法进行表型和疾病的概念选择。
J Biomed Semantics. 2015 Jun 1;6:24. doi: 10.1186/s13326-015-0019-z. eCollection 2015.
Drug Discov Today. 2014 Feb;19(2):140-4. doi: 10.1016/j.drudis.2013.09.012. Epub 2013 Sep 23.
4
BeCAS: biomedical concept recognition services and visualization.BeCAS:生物医学概念识别服务和可视化。
Bioinformatics. 2013 Aug 1;29(15):1915-6. doi: 10.1093/bioinformatics/btt317. Epub 2013 Jun 4.
5
PCorral--interactive mining of protein interactions from MEDLINE.PCorral--从 MEDLINE 中交互式挖掘蛋白质相互作用。
Database (Oxford). 2013 May 2;2013:bat030. doi: 10.1093/database/bat030. Print 2013.
6
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
7
Mouse genetic and phenotypic resources for human genetics.用于人类遗传学的小鼠遗传和表型资源。
Hum Mutat. 2012 May;33(5):826-36. doi: 10.1002/humu.22077.
8
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.针对首个和第二个CALBC银标准语料库对命名实体识别解决方案进行评估。
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S11. doi: 10.1186/2041-1480-2-S5-S11.
9
The NCBI Taxonomy database.NCBI 分类数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43. doi: 10.1093/nar/gkr1178. Epub 2011 Dec 1.
10
BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications.生物信息学知识库:通过国家生物医学本体学研究中心提供的新 Web 服务增强功能,以便在软件应用程序中访问和使用本体。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W541-5. doi: 10.1093/nar/gkr469. Epub 2011 Jun 14.