识别生物医学文本中的话语连接词。

Identifying discourse connectives in biomedical text.

作者信息

Ramesh Balaji Polepalli, Yu Hong

机构信息

University of Wisconsin Milwaukee, Milwaukee, WI.

出版信息

AMIA Annu Symp Proc. 2010 Nov 13;2010:657-61.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3041460/

Abstract

Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on biomedical articles (approximately 100K word tokens) that we annotated. The results show that the classifier trained on PDTB data attained a 0.55 F1-score for identifying discourse connectives in biomedical text, while the cross-validation results in the biomedical text attained a 0.69 F1-score, a much better performance despite a much smaller training size. Our preliminary analysis suggests the existence of domain-specific features, and we speculate that domain-adaption approaches may further improve performance.

摘要

话语连接词是连接或关联两个连贯句子或短语并表明话语关系存在的单词或短语。自动识别话语连接词可能会使许多自然语言处理应用受益。在这项初步研究中，我们报告了使用条件随机场（CRF）开发的监督式机器学习分类器，用于在全文生物医学文章中自动识别话语连接词。我们的第一个分类器是在开放域的100万个词元的宾州话语树库（PDTB）上训练的。我们对自己标注的生物医学文章（约10万个词元）进行了交叉验证。结果表明，在PDTB数据上训练的分类器在识别生物医学文本中的话语连接词时获得了0.55的F1分数，而在生物医学文本中的交叉验证结果获得了0.69的F1分数，尽管训练规模小得多，但性能要好得多。我们的初步分析表明存在特定领域的特征，并且我们推测领域适应方法可能会进一步提高性能。

相似文献

1

Identifying discourse connectives in biomedical text.识别生物医学文本中的话语连接词。

AMIA Annu Symp Proc. 2010 Nov 13;2010:657-61.

2

Automatic discourse connective detection in biomedical text.生物医学文本中的自动语篇连接词检测。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):800-8. doi: 10.1136/amiajnl-2011-000775. Epub 2012 Jun 28.

3

The biomedical discourse relation bank.生物医学话语关系库。

BMC Bioinformatics. 2011 May 23;12:188. doi: 10.1186/1471-2105-12-188.

4

Detecting hedge cues and their scope in biomedical text with conditional random fields.用条件随机场检测生物医学文本中的 hedge 线索及其范围。

J Biomed Inform. 2010 Dec;43(6):953-61. doi: 10.1016/j.jbi.2010.08.003. Epub 2010 Aug 13.

5

Semi-supervised learning of causal relations in biomedical scientific discourse.生物医学科学话语中因果关系的半监督学习

Biomed Eng Online. 2014;13 Suppl 2(Suppl 2):S1. doi: 10.1186/1475-925X-13-S2-S1. Epub 2014 Dec 11.

6

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

7

BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.BIOSMILE：一种用于生物医学动词的语义角色标注系统，它使用带有自动生成模板特征的最大熵模型。

BMC Bioinformatics. 2007 Sep 1;8:325. doi: 10.1186/1471-2105-8-325.

8

Automated detection of discourse segment and experimental types from the text of cancer pathway results sections.从癌症通路结果部分的文本中自动检测语篇片段和实验类型。

Database (Oxford). 2016 Aug 31;2016. doi: 10.1093/database/baw122. Print 2016.

9

Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives.开发和评估 RapTAT：一种用于从医学叙述中映射短语概念的机器学习系统。

J Biomed Inform. 2014 Apr;48:54-65. doi: 10.1016/j.jbi.2013.11.008. Epub 2013 Dec 4.

10

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

引用本文的文献

1

Detecting causality from online psychiatric texts using inter-sentential language patterns.使用句子间语言模式从在线精神科文本中检测因果关系。

BMC Med Inform Decis Mak. 2012 Jul 18;12:72. doi: 10.1186/1472-6947-12-72.

2

Automatic discourse connective detection in biomedical text.生物医学文本中的自动语篇连接词检测。

J Am Med Inform Assoc. 2012 Sep-Oct;19(5):800-8. doi: 10.1136/amiajnl-2011-000775. Epub 2012 Jun 28.

3

The biomedical discourse relation bank.生物医学话语关系库。

BMC Bioinformatics. 2011 May 23;12:188. doi: 10.1186/1471-2105-12-188.

本文引用的文献

1

The biomedical discourse relation bank.生物医学话语关系库。

BMC Bioinformatics. 2011 May 23;12:188. doi: 10.1186/1471-2105-12-188.

2

Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。

J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.

3

Detecting hedge cues and their scope in biomedical text with conditional random fields.用条件随机场检测生物医学文本中的 hedge 线索及其范围。

J Biomed Inform. 2010 Dec;43(6):953-61. doi: 10.1016/j.jbi.2010.08.003. Epub 2010 Aug 13.

4

Second i2b2 workshop on natural language processing challenges for clinical records.第二届关于临床记录自然语言处理挑战的i2b2研讨会。

AMIA Annu Symp Proc. 2008 Nov 6:1252-3.

5

Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.生物医学文本的多维分类：致力于为不同用户自动提供实用价值高的文本。

Bioinformatics. 2008 Sep 15;24(18):2086-93. doi: 10.1093/bioinformatics/btn381. Epub 2008 Aug 20.

6

New directions in biomedical text annotation: definitions, guidelines and corpus construction.生物医学文本注释的新方向：定义、指南与语料库构建

BMC Bioinformatics. 2006 Jul 25;7:356. doi: 10.1186/1471-2105-7-356.

7

ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.ABNER：一种用于在文本中自动标记基因、蛋白质及其他实体名称的开源工具。

Bioinformatics. 2005 Jul 15;21(14):3191-2. doi: 10.1093/bioinformatics/bti475. Epub 2005 Apr 28.

8

GENIA corpus--semantically annotated corpus for bio-textmining.GENIA语料库——用于生物文本挖掘的语义标注语料库。

Bioinformatics. 2003;19 Suppl 1:i180-2. doi: 10.1093/bioinformatics/btg1023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验