• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用元知识标注丰富生物医学事件语料库。

Enriching a biomedical event corpus with meta-knowledge annotation.

机构信息

National Centre for Text Mining, Manchester Interdisciplinary Biocentre, School of Computer Science, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.

出版信息

BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.

DOI:10.1186/1471-2105-12-393
PMID:21985429
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3222636/
Abstract

BACKGROUND

Biomedical papers contain rich information about entities, facts and events of biological relevance. To discover these automatically, we use text mining techniques, which rely on annotated corpora for training. In order to extract protein-protein interactions, genotype-phenotype/gene-disease associations, etc., we rely on event corpora that are annotated with classified, structured representations of important facts and findings contained within text. These provide an important resource for the training of domain-specific information extraction (IE) systems, to facilitate semantic-based searching of documents. Correct interpretation of these events is not possible without additional information, e.g., does an event describe a fact, a hypothesis, an experimental result or an analysis of results? How confident is the author about the validity of her analyses? These and other types of information, which we collectively term meta-knowledge, can be derived from the context of the event.

RESULTS

We have designed an annotation scheme for meta-knowledge enrichment of biomedical event corpora. The scheme is multi-dimensional, in that each event is annotated for 5 different aspects of meta-knowledge that can be derived from the textual context of the event. Textual clues used to determine the values are also annotated. The scheme is intended to be general enough to allow integration with different types of bio-event annotation, whilst being detailed enough to capture important subtleties in the nature of the meta-knowledge expressed in the text. We report here on both the main features of the annotation scheme, as well as its application to the GENIA event corpus (1000 abstracts with 36,858 events). High levels of inter-annotator agreement have been achieved, falling in the range of 0.84-0.93 Kappa.

CONCLUSION

By augmenting event annotations with meta-knowledge, more sophisticated IE systems can be trained, which allow interpretative information to be specified as part of the search criteria. This can assist in a number of important tasks, e.g., finding new experimental knowledge to facilitate database curation, enabling textual inference to detect entailments and contradictions, etc. To our knowledge, our scheme is unique within the field with regards to the diversity of meta-knowledge aspects annotated for each event.

摘要

背景

生物医学文献包含有关生物相关实体、事实和事件的丰富信息。为了自动发现这些信息,我们使用文本挖掘技术,这些技术依赖于标注语料库进行训练。为了提取蛋白质-蛋白质相互作用、基因型-表型/基因-疾病关联等,我们依赖于事件语料库,这些语料库使用分类、结构化的方式标注文本中包含的重要事实和发现。这些语料库为特定领域的信息抽取(IE)系统的训练提供了重要资源,有助于基于语义搜索文档。如果没有额外的信息,就不可能正确解释这些事件,例如,事件是否描述事实、假设、实验结果还是结果分析?作者对她的分析的有效性有多少信心?这些和其他类型的信息,我们统称为元知识,可以从事件的上下文中推导出来。

结果

我们设计了一个用于生物医学事件语料库元知识丰富化的标注方案。该方案是多维的,因为每个事件都被标注了 5 个不同方面的元知识,这些元知识可以从事件的文本上下文中推导出来。用于确定值的文本线索也被标注。该方案旨在足够通用,以便与不同类型的生物事件标注集成,同时又足够详细,以捕捉文本中表达的元知识的本质上的重要细微差别。我们在这里报告了该标注方案的主要特点,以及它在 GENIA 事件语料库(1000 篇摘要,36858 个事件)中的应用。已经实现了较高水平的注释者间一致性,落在 0.84-0.93 Kappa 范围内。

结论

通过为事件标注增加元知识,可以训练更复杂的 IE 系统,从而允许将解释性信息指定为搜索条件的一部分。这可以在许多重要任务中提供帮助,例如,查找新的实验知识以促进数据库整理,启用文本推理以检测蕴涵和矛盾等。据我们所知,我们的方案在该领域中是独一无二的,因为它为每个事件标注了多种元知识方面。

相似文献

1
Enriching a biomedical event corpus with meta-knowledge annotation.用元知识标注丰富生物医学事件语料库。
BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.
2
BioCause: Annotating and analysing causality in the biomedical domain.生物原因:生物医学领域的因果关系标注与分析。
BMC Bioinformatics. 2013 Jan 16;14:2. doi: 10.1186/1471-2105-14-2.
3
Extracting semantically enriched events from biomedical literature.从生物医学文献中提取语义丰富的事件。
BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.
4
Construction of an annotated corpus to support biomedical information extraction.构建带注释语料库以支持生物医学信息抽取。
BMC Bioinformatics. 2009 Oct 23;10:349. doi: 10.1186/1471-2105-10-349.
5
Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。
BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Wide coverage biomedical event extraction using multiple partially overlapping corpora.使用多个部分重叠语料库进行广泛的生物医学事件抽取。
BMC Bioinformatics. 2013 Jun 3;14:175. doi: 10.1186/1471-2105-14-175.
8
Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称:一种机器学习方法。
Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.
9
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
10
An annotated corpus of clinical trial publications supporting schema-based relational information extraction.支持基于模式的关系信息抽取的临床试验文献标注语料库。
J Biomed Semantics. 2022 May 23;13(1):14. doi: 10.1186/s13326-022-00271-7.

引用本文的文献

1
Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.基于 BERT 的方法从生物医学文献中自动提取细菌的转录调控相互作用。
Database (Oxford). 2024 Aug 30;2024. doi: 10.1093/database/baae094.
2
Creating an ignorance-base: Exploring known unknowns in the scientific literature.创建一个无知库:探索科学文献中的已知未知。
J Biomed Inform. 2023 Jul;143:104405. doi: 10.1016/j.jbi.2023.104405. Epub 2023 Jun 1.
3
A novel corpus of molecular to higher-order events that facilitates the understanding of the pathogenic mechanisms of idiopathic pulmonary fibrosis.

本文引用的文献

1
Construction of an annotated corpus to support biomedical information extraction.构建带注释语料库以支持生物医学信息抽取。
BMC Bioinformatics. 2009 Oct 23;10:349. doi: 10.1186/1471-2105-10-349.
2
The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.生物显微镜语料库:标注了不确定性、否定及其范围的生物医学文本。
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-9-S11-S9.
3
Recognizing speculative language in biomedical research articles: a linguistically motivated perspective.
一种新的分子到更高阶事件的语料库,有助于理解特发性肺纤维化的发病机制。
Sci Rep. 2023 Apr 12;13(1):5986. doi: 10.1038/s41598-023-32915-8.
4
A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。
NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.
5
New reasons for biologists to write with a formal language.生物学家使用正式语言写作的新理由。
Database (Oxford). 2022 Jun 3;2022. doi: 10.1093/database/baac039.
6
Identifying and classifying goals for scientific knowledge.识别和分类科学知识的目标。
Bioinform Adv. 2021 Jul 28;1(1):vbab012. doi: 10.1093/bioadv/vbab012. eCollection 2021.
7
Data-driven classification of the certainty of scholarly assertions.基于数据的学术论断确定性分类
PeerJ. 2020 Apr 21;8:e8871. doi: 10.7717/peerj.8871. eCollection 2020.
8
Towards a characterization of apparent contradictions in the biomedical literature using context analysis.使用语境分析来刻画生物医学文献中的明显矛盾。
J Biomed Inform. 2019 Oct;98:103275. doi: 10.1016/j.jbi.2019.103275. Epub 2019 Aug 29.
9
Annotation and detection of drug effects in text for pharmacovigilance.用于药物警戒的文本中药物效应的标注与检测。
J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.
10
Identification of research hypotheses and new knowledge from scientific literature.从科学文献中识别研究假设和新知识。
BMC Med Inform Decis Mak. 2018 Jun 25;18(1):46. doi: 10.1186/s12911-018-0639-1.
识别生物医学研究文章中的推测性语言:一种基于语言学的视角。
BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S10. doi: 10.1186/1471-2105-9-S11-S10.
4
Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.生物医学文本的多维分类:致力于为不同用户自动提供实用价值高的文本。
Bioinformatics. 2008 Sep 15;24(18):2086-93. doi: 10.1093/bioinformatics/btn381. Epub 2008 Aug 20.
5
New challenges for text mining: mapping between text and manually curated pathways.文本挖掘的新挑战:文本与人工整理通路之间的映射
BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-9-S3-S5.
6
Getting started in text mining.文本挖掘入门。
PLoS Comput Biol. 2008 Jan;4(1):e20. doi: 10.1371/journal.pcbi.0040020.
7
Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。
BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.
8
Frontiers of biomedical text mining: current progress.生物医学文本挖掘前沿:当前进展
Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30.
9
Negation of protein-protein interactions: analysis and extraction.蛋白质-蛋白质相互作用的否定:分析与提取
Bioinformatics. 2007 Jul 1;23(13):i424-32. doi: 10.1093/bioinformatics/btm184.
10
BioInfer: a corpus for information extraction in the biomedical domain.生物推理(BioInfer):一个用于生物医学领域信息提取的语料库。
BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.