• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从生物医学文献中构建语义谓词黄金标准。

Constructing a semantic predication gold standard from the biomedical literature.

机构信息

Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD, USA.

出版信息

BMC Bioinformatics. 2011 Dec 20;12:486. doi: 10.1186/1471-2105-12-486.

DOI:10.1186/1471-2105-12-486
PMID:22185221
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3281188/
Abstract

BACKGROUND

Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology.

RESULTS

We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations.

CONCLUSIONS

While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.

摘要

背景

语义关系越来越成为生物医学文本挖掘和知识发现应用的基础。这些实际应用的成功在很大程度上取决于提取关系的质量,而这可以通过与黄金标准参考进行比较来评估。生物医学文本挖掘中的大多数此类参考集中在狭窄的子领域,并采用不同的语义表示,因此难以独立用于基准测试自主开发的关系提取系统。在本文中,我们提出了一项多阶段黄金标准注释研究,其中我们对来自 MEDLINE 摘要的 500 个随机句子进行了注释,涵盖了广泛的生物医学主题,共涉及 1371 个语义谓词。UMLS Metathesaurus 用作概念信息的主要来源,UMLS Semantic Network 用于关系信息。我们测量了注释者之间的一致性,并对注释进行了深入分析,以确定基于本体或术语对生物医学文本进行关系注释所面临的一些挑战。

结果

我们在实践阶段获得了公平到中等的注释者之间的一致性(0.378-0.475)。通过改进指南并增加语义等价标准,在主要注释阶段,一致性提高了 12%(0.415 到 0.536)。此外,我们发现当仅基于明确提供的 UMLS 概念和关系计算一致性时,一致性提高到 0.688。

结论

虽然在实践阶段的注释者之间的一致性确认了概念注释是一项具有挑战性的任务,但在主要注释阶段一致性的提高表明,可以通过设置更严格的指南和建立语义等价标准,在多个迭代中达到可接受的一致性水平。将文本映射到本体概念是概念注释的主要挑战。注释涉及生物分子实体和过程的谓词特别具有挑战性。虽然所得黄金标准主要用于作为我们语义解释器的测试集,但我们认为所吸取的教训具有普遍适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9574/3281188/d4f091c93e23/1471-2105-12-486-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9574/3281188/d4f091c93e23/1471-2105-12-486-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9574/3281188/d4f091c93e23/1471-2105-12-486-1.jpg

相似文献

1
Constructing a semantic predication gold standard from the biomedical literature.从生物医学文献中构建语义谓词黄金标准。
BMC Bioinformatics. 2011 Dec 20;12:486. doi: 10.1186/1471-2105-12-486.
2
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
3
RysannMD: A biomedical semantic annotator balancing speed and accuracy.RysannMD:一款兼顾速度与准确性的生物医学语义注释工具。
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.
4
Comparing different knowledge sources for the automatic summarization of biomedical literature.比较用于生物医学文献自动摘要的不同知识来源。
J Biomed Inform. 2014 Dec;52:319-28. doi: 10.1016/j.jbi.2014.07.014. Epub 2014 Jul 24.
5
Concept annotation in the CRAFT corpus.概念标注在 CRAFT 语料库中。
BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.
6
SemMedDB: a PubMed-scale repository of biomedical semantic predications.SemMedDB:一个基于 PubMed 规模的生物医学语义断言知识库。
Bioinformatics. 2012 Dec 1;28(23):3158-60. doi: 10.1093/bioinformatics/bts591. Epub 2012 Oct 8.
7
Extraction of semantic biomedical relations from text using conditional random fields.使用条件随机场从文本中提取语义生物医学关系。
BMC Bioinformatics. 2008 Apr 23;9:207. doi: 10.1186/1471-2105-9-207.
8
Alignment of the UMLS semantic network with BioTop: methodology and assessment.统一医学语言系统语义网络与生物主题词表的比对:方法与评估
Bioinformatics. 2009 Jun 15;25(12):i69-76. doi: 10.1093/bioinformatics/btp194.
9
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.用于生物医学概念识别的多语言金标准语料库:Mantra GSC。
J Am Med Inform Assoc. 2015 Sep;22(5):948-56. doi: 10.1093/jamia/ocv037. Epub 2015 May 6.
10
Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.基于美国国立医学图书馆统一医学语言系统中的机器学习和概念关系对模糊术语进行自动解析。
J Am Med Inform Assoc. 2002 Nov-Dec;9(6):621-36. doi: 10.1197/jamia.m1101.

引用本文的文献

1
Enhancing the coverage of SemRep using a relation classification approach.利用关系分类方法增强 SemRep 的覆盖范围。
J Biomed Inform. 2024 Jul;155:104658. doi: 10.1016/j.jbi.2024.104658. Epub 2024 May 21.
2
Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.使用结合生物医学文献和本体结构化知识的知识图进行因果特征选择:以抑郁症作为阿尔茨海默病风险因素为例的研究。
J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.
3
COS: A new MeSH term embedding incorporating corpus, ontology, and semantic predications.

本文引用的文献

1
EpiphaNet: An Interactive Tool to Support Biomedical Discoveries.EpiphaNet:支持生物医学发现的交互式工具。
J Biomed Discov Collab. 2010 Sep 21;5:21-49.
2
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.
3
CALBC silver standard corpus.CALBC银标准语料库。
COS:一种新的包含语料库、本体和语义谓词的 MeSH 术语嵌入方法。
PLoS One. 2021 May 4;16(5):e0251094. doi: 10.1371/journal.pone.0251094. eCollection 2021.
4
Identifying disease trajectories with predicate information from a knowledge graph.基于知识图谱中的谓词信息识别疾病轨迹。
J Biomed Semantics. 2020 Aug 20;11(1):9. doi: 10.1186/s13326-020-00228-8.
5
UMLS users and uses: a current overview.《统一医学语言系统》的用户与用途:当前概述
J Am Med Inform Assoc. 2020 Jul 19;27(10):1606-11. doi: 10.1093/jamia/ocaa084.
6
Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。
BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.
7
A Knowledge Graph of Combined Drug Therapies Using Semantic Predications From Biomedical Literature: Algorithm Development.利用生物医学文献中的语义谓词构建的联合药物治疗知识图谱:算法开发
JMIR Med Inform. 2020 Apr 28;8(4):e18323. doi: 10.2196/18323.
8
A plea to stop using the case-control design in retrospective database studies.呼吁停止在回顾性数据库研究中使用病例对照设计。
Stat Med. 2019 Sep 30;38(22):4199-4208. doi: 10.1002/sim.8215. Epub 2019 Aug 22.
9
Rare disease knowledge enrichment through a data-driven approach.通过数据驱动的方法丰富罕见病知识。
BMC Med Inform Decis Mak. 2019 Feb 14;19(1):32. doi: 10.1186/s12911-019-0752-9.
10
Evaluating active learning methods for annotating semantic predications.评估用于标注语义谓词的主动学习方法。
JAMIA Open. 2018 Oct;1(2):275-282. doi: 10.1093/jamiaopen/ooy021. Epub 2018 Jun 27.
J Bioinform Comput Biol. 2010 Feb;8(1):163-79. doi: 10.1142/s0219720010004562.
4
Construction of an annotated corpus to support biomedical information extraction.构建带注释语料库以支持生物医学信息抽取。
BMC Bioinformatics. 2009 Oct 23;10:349. doi: 10.1186/1471-2105-10-349.
5
Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation.基于证据的医学治疗的 MEDLINE 引文自动摘要:面向主题的评估。
J Biomed Inform. 2009 Oct;42(5):801-13. doi: 10.1016/j.jbi.2008.10.002. Epub 2008 Nov 5.
6
Assessment of disease named entity recognition on a corpus of annotated sentences.基于带注释句子语料库的疾病命名实体识别评估。
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.
7
Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。
BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.
8
Extracting semantic predications from Medline citations for pharmacogenomics.从医学文献数据库(Medline)引用中提取药物基因组学的语义谓词。
Pac Symp Biocomput. 2007:209-20.
9
BioInfer: a corpus for information extraction in the biomedical domain.生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。
BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.
10
Exploiting semantic relations for literature-based discovery.利用语义关系进行基于文献的发现。
AMIA Annu Symp Proc. 2006;2006:349-53.