• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

共指消解对细菌与生物栖息地实体之间监督关系检测的贡献。

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.

作者信息

Lavergne Thomas, Grouin Cyril, Zweigenbaum Pierre

出版信息

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

DOI:10.1186/1471-2105-16-S10-S6
PMID:26201352
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4511182/
Abstract

BACKGROUND

The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided.

RESULTS

We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system.

CONCLUSIONS

We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task.

摘要

背景

在BioNLP - ST 2013细菌生态位任务中所定义的关于细菌的短文本里,获取细菌与其位置(栖息地和地理位置)之间关系的知识,依赖于检测这三种类型的实体提及之间的共指链接。据我们所知,该任务的参与者均未研究过这种情况。本研究专门探讨了这种情况引发的问题:(i)如何检测这些共指链接及相关的共指链;(ii)如何利用它们来准备正例和反例,以训练一个用于检测实体提及之间关系的监督系统;(iii)当提供共指链时,哪些实体提及周围的上下文有助于关系检测。

结果

我们展示了使用金标准实体提及(BioNLP - ST 2013的任务2)和自动检测的实体提及(BioNLP - ST 2013的任务3中的端到端系统)所获得的实验和结果。我们的监督提及检测系统使用线性链条件随机场分类器,关系检测系统依赖于逻辑回归(又名最大熵)分类器。它们使用一组形态、形态句法和语义特征。为了尽量减少错误推断,共指消解应用了一组旨在优化精度的启发式规则。这些规则考虑了检测到的实体提及的类型,并利用了语料库文本的教学性质,其中很大一部分细菌命名相当明确(尽管像“这种细菌”这样的自然指代表达很常见)。当将金标准实体提及作为输入时,所得到的系统在官方测试集上的F值为0.495,当将我们的条件随机场系统预测的实体提及作为输入时,F值为0.351,这两个结果均高于BioNLP - ST 2013中最佳参与者系统的结果。

结论

我们表明,与不使用共指信息的基线系统相比,共指消解有显著改进:对于端到端系统,在测试语料库上提高了约3.5个F值点(在开发语料库上提高了5.5个点),当使用金标准提及时,在开发语料库和测试语料库上均提高了7个F值点。虽然这在BioNLP - ST 2013细菌生态位数据集上优于已发表的最佳系统,但我们认为它主要提供了一个更强的基线,可在此基础上开展更多工作。我们还强调了设计全面的金标准共指标注的重要性和难度,我们解释这是该任务进一步取得进展的关键。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/b2208acc1e35/1471-2105-16-S10-S6-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/644d8cf0cbfc/1471-2105-16-S10-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/31a41392b362/1471-2105-16-S10-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/0070432098a3/1471-2105-16-S10-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/6b8bbac0d395/1471-2105-16-S10-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/7744632903f3/1471-2105-16-S10-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/8e4ca6fb18a1/1471-2105-16-S10-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/b2208acc1e35/1471-2105-16-S10-S6-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/644d8cf0cbfc/1471-2105-16-S10-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/31a41392b362/1471-2105-16-S10-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/0070432098a3/1471-2105-16-S10-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/6b8bbac0d395/1471-2105-16-S10-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/7744632903f3/1471-2105-16-S10-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/8e4ca6fb18a1/1471-2105-16-S10-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d79f/4511182/b2208acc1e35/1471-2105-16-S10-S6-7.jpg

相似文献

1
The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.共指消解对细菌与生物栖息地实体之间监督关系检测的贡献。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.
2
Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.生物自然语言处理2013共享任务中的基因调控网络与细菌生态位任务概述。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-16-S10-S1. Epub 2015 Jul 13.
3
Detection and categorization of bacteria habitats using shallow linguistic analysis.利用浅层语言分析检测和分类细菌栖息地
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-16-S10-S5. Epub 2015 Jul 13.
4
Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。
BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.
5
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.细菌生境事件抽取:一种基于知识密集型自然语言处理的方法。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-13-S11-S8.
6
Biomedical event extraction based on GRU integrating attention mechanism.基于 GRU 集成注意力机制的生物医学事件抽取。
BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):285. doi: 10.1186/s12859-018-2275-2.
7
Structured learning for spatial information extraction from biomedical text: bacteria biotopes.从生物医学文本中提取空间信息的结构化学习:细菌生物栖息地
BMC Bioinformatics. 2015 Apr 25;16:129. doi: 10.1186/s12859-015-0542-z.
8
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
9
Curatable Named-Entity Recognition Using Semantic Relations.利用语义关系进行可治愈命名实体识别
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):785-92. doi: 10.1109/TCBB.2014.2366770.
10
Combining glass box and black box evaluations in the identification of heart disease risk factors and their temporal relations from clinical records.结合玻璃盒和黑盒评估方法从临床记录中识别心脏病风险因素及其时间关系。
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S133-S142. doi: 10.1016/j.jbi.2015.06.014. Epub 2015 Jul 2.

引用本文的文献

1
Relation Extraction from Clinical Narratives Using Pre-trained Language Models.使用预训练语言模型从临床叙述中提取关系
AMIA Annu Symp Proc. 2020 Mar 4;2019:1236-1245. eCollection 2019.
2
Unsupervised inference of implicit biomedical events using context triggers.使用上下文触发器进行无监督的隐含生物医学事件推断。
BMC Bioinformatics. 2020 Jan 28;21(1):29. doi: 10.1186/s12859-020-3341-0.
3
Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning.

本文引用的文献

1
Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.生物自然语言处理2013共享任务中的基因调控网络与细菌生态位任务概述。
BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-16-S10-S1. Epub 2015 Jul 13.
2
Collective instance-level gene normalization on the IGN corpus.对 IGN 语料库进行集体实例级基因标准化。
PLoS One. 2013 Nov 25;8(11):e79517. doi: 10.1371/journal.pone.0079517. eCollection 2013.
3
TEMPTING system: a hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries.
利用结合知识库和深度学习的自然语言处理系统提取药物和相关药物不良事件。
J Am Med Inform Assoc. 2020 Jan 1;27(1):56-64. doi: 10.1093/jamia/ocz141.
4
COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.COPIOUS:一个用于从生物多样性文献中提取物种出现信息的命名实体黄金标准语料库。
Biodivers Data J. 2019 Jan 22(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019.
5
Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.从电子健康记录笔记中提取与药物不良事件相关的信息:基于深度学习的端到端模型设计
JMIR Med Inform. 2018 Nov 26;6(4):e12159. doi: 10.2196/12159.
6
Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.科罗拉多生物医学期刊文章丰富注释全文(CRAFT)语料库中的共指标注与消解
BMC Bioinformatics. 2017 Aug 17;18(1):372. doi: 10.1186/s12859-017-1775-9.
7
A neural joint model for entity and relation extraction from biomedical text.一种用于从生物医学文本中提取实体和关系的神经联合模型。
BMC Bioinformatics. 2017 Mar 31;18(1):198. doi: 10.1186/s12859-017-1609-9.
8
Sortal anaphora resolution to enhance relation extraction from biomedical literature.用于增强从生物医学文献中提取关系的类别指代消解。
BMC Bioinformatics. 2016 Apr 14;17:163. doi: 10.1186/s12859-016-1009-6.
9
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统(Bio-SCoRes):一种用于生物医学文本共指消解的混合架构
PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
10
Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.用图算法弥合语义与句法——提取生物医学关系的研究现状
Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5.
TEMPTING 系统:一种规则和机器学习的混合方法,用于从患者出院小结中提取时间关系。
J Biomed Inform. 2013 Dec;46 Suppl:S54-S62. doi: 10.1016/j.jbi.2013.09.007. Epub 2013 Sep 20.
4
Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.使用带有词表示特征的结构支持向量机识别医院出院小结中的临床实体。
BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.
5
A controlled greedy supervised approach for co-reference resolution on clinical text.一种针对临床文本的共指消解的受控贪婪监督方法。
J Biomed Inform. 2013 Jun;46(3):506-15. doi: 10.1016/j.jbi.2013.03.007. Epub 2013 Apr 4.
6
Biological network extraction from scientific literature: state of the art and challenges.从科学文献中提取生物网络:现状与挑战。
Brief Bioinform. 2014 Sep;15(5):856-77. doi: 10.1093/bib/bbt006. Epub 2013 Feb 22.
7
Improving protein coreference resolution by simple semantic classification.通过简单的语义分类提高蛋白质共指解析的准确性。
BMC Bioinformatics. 2012 Nov 17;13:304. doi: 10.1186/1471-2105-13-304.
8
A rule based solution to co-reference resolution in clinical text.基于规则的临床文本共指消解解决方案。
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):891-7. doi: 10.1136/amiajnl-2011-000770. Epub 2012 Oct 11.
9
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.细菌生境事件抽取:一种基于知识密集型自然语言处理的方法。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-13-S11-S8.
10
BioNLP Shared Task--The Bacteria Track.生物自然语言处理共享任务——细菌专题。
BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-13-S11-S3.